Benchmarking feature selection methods for compressing image information in high-content screening
October 26, 2021
Biopharmaceutical drug discovery, as of today is a highly automated, high throughput endeavor, where many screening technologies produce a high-dimensional measurement per sample. A striking example is High Content Screening (HCS), which utilizes automated microscopy to systematically access the wealth of information contained in biological assays. Exploiting HCS to its full potential traditionally requires extracting a high number of features from the images to capture as much information as possible, then performing algorithmic analysis and complex data visualization in order to render this high-dimensional data into an interpretable and instructive information for guiding drug development. In this process, automated feature selection methods condense the feature set to reduce non-useful or redundant information and render it more meaningful. We compare 12 state-of-the-art feature selection methods (both supervised and unsupervised) by systematically testing them on two HCS datasets from drug screening imaging assays of high practical relevance. Considering as evaluation metrics standard plate-, assay- or compound statistics on the final results, we assess the generalizability and importance of the selected features by use of automated machine learning (AutoML) to achieve an unbiased evaluation across methods. Results provide practical guidance on experiment design, optimal sizing of a reduced feature set and choice of feature selection method, both in situations where useful experimental control states are available (enabling use of supervised algorithms) or where such controls are unavailable, using unsupervised techniques.
Our team continues to develop and improve the technology underying Genedata Imagence.