Unsupervised Phenotype Discovery in High-Content Imaging via Archetypes and Self-Supervised Learning
October 27, 2025
Identifying novel therapeutic candidates for complex diseases remains a major challenge in modern drug discovery. To address this, biopharmaceutical research increasingly relies on automated, high-throughput screening assays using cell culture models to evaluate thousands of compounds in parallel. However, the resulting large-scale imaging data complicates systematic expert review, making phenotype discovery and classification dependent on extensive—and often biased—manual curation.
A common strategy to mitigate this issue is archetypal analysis, which identifies phenotypes within a dataset. In this work, we introduce an end-to-end deep learning framework that simultaneously learns embeddings from high-content images and uncovers phenotypic structures without supervision [1]. Building on these representations, we apply self-supervised learning to construct a phenotypic embedding space, enabling intuitive visual exploration and downstream assay analysis.
Comprehensive experiments on industry-relevant assays demonstrate that our approach outperforms existing unsupervised and supervised methods, providing a scalable and unbiased pipeline for drug screening and functional genomics [2].
[1] Wieser, M et al. "Revisiting Deep Archetypal Analysis for Phenotype Discovery in High Content Imaging." 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE Computer Society, 2025.
[2] Siegismund, D et al. "Self-supervised representation learning for high-content screening." International Conference on Medical Imaging with Deep Learning. PMLR, 2022.
