Loon – Finding the Right Cancer Drug with Microscopy and Visualization OR How to use Exemplars to Effectively Browse Large Complex Data Sources
In this blog post, we want to tell two stories: First, we want to explore how visualization can be used to select cancer treatments. And second, we want to tell a more technical story of how you can use exemplars to deal with large datasets that you can’t easily summarize.
When you or a person close to you are diagnosed with cancer, you certainly want them to get the best drug available for their situation. However, picking the right cancer drug for a particular patient is exceedingly difficult. Research on the topic – under the umbrella term precision medicine – is ongoing. Common approaches include genetic profiling or advanced imaging. Our collaborators, however, are developing a different approach: just trying a lot of drugs at the same time and seeing what happens. The trick is to not give the drugs to the patient directly, but instead take cells from the patient’s tumor, and watch them grow (or die) under a microscope!
To see whether a drug works for a particular tumor, our collaborators take pictures of cancerous cells with microscopes – a lot of pictures! A typical dataset we worked with would have over a hundred thousand cells photographed over the course of a day — resulting in over a million data points. How do you analyze a dataset of a million cells? To answer these questions, we developed Loon, a visualization tool for cell microscopy data. For a quick introduction to the work check out the following video:
The first thing we can do is measure the attributes of each cell and look at the distribution of those attributes. This approach is something we did early on in our development. These plots are useful, especially when paired with linked brushing as shown below.
Statistical visualizations of cells and cell growth trajectories. A brush highlights large cells, and the other plots show the distribution of large cells with respect to the attribute shown.
To get a clearer view for how cell growth is affected by different drugs we faceted the data by experimental condition and plotted the average mass of cells over time. This gives a high-level answer to our question for how cancer cells are affected by different drugs. For example you can see that the growth rate for the drug “dox” steadily decreases as concentration increases, whereas 4HT changes dramatically between 2 and 20 micromolars. At a first glance this seems to answer our question, identifying the most suitable drug. However, a lot of information is unavailable in this view. Why is there a decrease in mass? Are cells truly dying, or is there something wrong with the complex processes required to produce this data?
Growth rates for cancer cells treated with five drugs and two controls (the rows) at six different concentrations (the columns). For each condition the area chart shows the average mass of all the cells over time. The bold line shows the same information but for selected cells.
Using Exemplars to Analyze Large Amounts of Complex Data
To get at the data lost in the aggregate statistical views let’s consider a single image of cells. To most people this image doesn’t mean much beyond a collection of blobs with different amounts of “blobby-ness”. But our expert collaborators see cells in different stages of their lifecycle, including happily growing cells and cells that are dying. They also see potential problems in their image processing, they see when there are problems with their experimental setup. There is a lot of information, even in a single image. The problem is: there are a lot of images they need to check out. A single experiment that runs for 24 hours can generate over 50,000 images. That is simply too many to inspect individually.
An image of cancer cells and their segmentation.
So how do we preserve the rich contextual information in the images while still providing some way to understand the large dataset? The solution we developed is to use exemplars. We automatically pick representative cells based on a user-specified attribute, such as cell size. For example, we sample from the size distribution to show a typical small cell, a medium cell, and a large cell. We then show a cut-out of each exemplar cell at multiple time points, so that an analyst can inspect these representative examples of different types of cells.
Exemplars for three different drug treatments (conditions). Each condition shows a small, a medium, and a large exemplar cell and its growth trajectory. The histograms on the left show the distribution of the size (mass) for each condition. The line charts show the growth trajectory of the exemplar cells in the condition.
This approach of carefully selecting representative data points and showing them in their “raw” format could also be applied to other kinds of data. This is particularily valuable if the data has contextual aspects (like the shape of the cells and the quality of the segmentation in our example) that are not easy to fully quantify or summarize, and the dataset is large enough where manual inspection is no longer practical.
We published a paper describing this technique at last year’s VIS conference. We’re also very excited that the paper won an honorable mention for best paper! If you’re interested, check out the details in the paper or watch a recording of the conference presentation.
Bonus content! We have had a few people ask where the name Loon comes from for the tool. This is maybe more common because it is not explained in the paper. During the brainstorming process we thought about our collaborators’ microscopes. These measure the phase shift of light as it passes through cells to calculate the mass of those cells. Since the moon emits light, and also goes through phases we tried a number of lunar-esque names. Eventually we just dropped this to “Loon”. To me this holds an extra meaning because Loons are the state bird of Minnesota, my home state.