Presented at the SLAS2019, Washington, DC
When automating analysis of high-content screening images, a key step is reducing the object image (e. g. a cell) to a vector of features that describe that object for subsequent comparison with other objects. In classical computer vision, many of these features have an implicit biological meaning (e. g. cell size, intensity, aspect ratio, etc).
Depending on the research question, feature vectors then are subjected to univariate or multivariate analysis. Nonlinear dimensionality reduction techniques such as tSNE and its variants  are particularly useful when representing or embedding high-dimensional data in two or three dimensions. Ideally, such visualizations should clearly separate different phenotypes observed in the experiment. However, the embedding quality relies heavily on the quality of the input features derived from the HCS images.
In this poster, we use a high-content screening translocation assay to compare the quality of embedding produced by features extracted using classical versus deep learning-based feature approaches. We compare them based on ability to separate phenotypes and robustness against batch effects. We show that while classical and deep-learning-derived feature sets or a combination of both all produce excellent results, when using classical features, attaining this level of quality requires expert tuning of the feature extraction process to the assay of interest. In contrast, deep-learning-based feature extraction is fully automated and does not require expert knowledge.
Using deep learning-based features, we observe that the embedding quality depends on the network architecture. Standard CNN architectures perform poorly, while tailored architectures outperform classical methods. Finally, we show that for embedding and visualization, adding classical features to the deep-learning based features does not increase resolution and is thus unnecessary.
 L. J. P. van der Maaten and G. E. Hinton. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9(Nov):2579-2605, 2008