Enhancing Biopharma ROI with In-House Long-Read Sequencing
June 16, 2025
Raed Hmadi
In the dynamic landscape of biopharmaceutical R&D, next-generation sequencing (NGS) has emerged as an unbiased and highly sensitive technology. It is increasingly being adopted as a multi-attribute method (MAM) to provide information on critical quality attributes (CQAs) such as identity, integrity, and safety. Long-read sequencing has further advanced the field of NGS by enabling the sequencing of long DNA fragments, which was previously problematic due to difficulty sequencing structural variants and complex genomic regions. Integrating long-read NGS-based assays in-house can significantly streamline biopharma workflows and replace multiple legacy assays with a single quality control assay, which is more efficient and cost-effective.
What is Long-Read Sequencing?
Long-read sequencing is a genomic sequencing technology that enables the reading of long stretches of DNA or RNA, from 1,000 to over 20,000 bases, in a single pass. Unlike short-read sequencing, which analyses shorter fragments (typically 50 to 300 bases), long-read sequencing captures entire regions in a single read, significantly reducing the need for complex assembly.
This ability to sequence native, unamplified DNA or RNA molecules makes long-read sequencing particularly valuable for studying complex genomic regions, such as those with high GC content, repetitive elements, or structural variation. It has become a powerful quality control tool, especially in areas where short-read technologies often fall short. By preserving the integrity of native sequences, long-read approaches provide richer, more reliable genetic insights.
Advantages of Long-Read Sequencing
Long-read sequencing, provided by platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), offers numerous benefits over short-read sequencing, extending beyond the ability to accurately sequence longer reads1. Given these advantages, long-read sequencing technologies have ushered in a new era of DNA and RNA sequencing.
- Amplification-free sequencing: Both platforms offer real-time sequencing without the need for PCR amplification, reducing bias and preserving native DNA structure.
- Direct epigenetic Insights: Long-read sequencing can directly detect base modifications such as DNA methylation, eliminating the need for chemical treatments such as bisulfite conversion.
- Direct RNA sequencing: ONT enables direct sequencing of RNA molecules, bypassing reverse transcription and improving isoform-level transcriptome analysis2.
- Increased accuracy in complex regions: Long reads are better at resolving repetitive sequences, structural variants, and highly polymorphic regions that short reads often miss.
Applications in Biopharma
Long-read sequencing is being applied in a number of fields, including in biopharma3, agricultural biotechnology4,5, and clinical diagnostics6,7. In biopharma, key applications include clone selection, cell line characterization, genetic stability studies, biosafety testing, and lot release testing8. It is also used to evaluate a wide range of biotherapeutic modalities, including antibodies, vaccines, and cell and gene therapies9.
Safeguarding the integrity of producer cell lines, such as master cell banks, benefits greatly from long-read sequencing. It provides accurate base calling and identifies variants, even in GC-rich repetitive regions, compared to the reference genome. Long-read sequencing is also well suited for verifying over time the genetic identity of engineered cells, such as Chinese hamster ovary (CHO) cells. It captures genetic drift and ensures consistent performance in bioproduction. Additionally, scientists can easily verify gene integration sites and determine gene copy numbers, enabling a comprehensive investigation of cell lines3.
Long-read NGS-based assays are also valuable for developing cell and gene therapies. They detect fusion or truncation events during bioprocess development, ensuring product safety and efficacy. They can also fully sequence plasmids and vectors to verify the orientation of gene inserts, promoters, or variants, regardless of repetitive regions or dimers10. They ensure that correct and error-free adeno-associated viruses (AAVs) are packaged into capsids by thoroughly characterizing AAVs for mutations or deletions, including inverted terminal repeats11.
International regulatory bodies have recently recommended the use of NGS for biosafety testing to replace traditional in vitro and in vivo assays, especially in GMP environments. This shift is driven by the need for more accurate, comprehensive, and efficient testing to ensure the safety and efficacy of therapeutic products. Long-read NGS-based assays offer significant advantages by sequencing native DNA and RNA, spanning entire viral, bacterial, and mycoplasma genomes. This provides a more detailed and complete picture of contaminating genetic material. Long-read sequencing also complements traditional NGS by identifying low abundance contaminants and those with complex genomic structures, improving the detection of known contaminants and the discovery of novel pathogens.

Challenges of Long-Read Sequencing
While the advantages of long-read sequencing are numerous, implementing it in-house presents significant challenges for biopharma organizations, particularly in high-throughput or cost-sensitive environments. Below are key hurdles that organizations must consider when implementing long-read sequencing assays.
Storage and Management Data
The substantial and ultra-rich data generated by long-read sequencing far exceeds the volume generated by short-read platforms. These large files require an advanced data storage infrastructure that can handle both the size and complexity of the data. Efficient data management solutions are essential to ensure scalability, maintain performance, and enable secure access and sharing. Without such systems, organizations can experience data bottlenecks, slow processing times, and increased infrastructure maintenance costs.
Data Validation and Reliability
While the accuracy of long-read sequencing has improved significantly, especially with high-fidelity (HiFi) reads, ensuring the reliability of the data often requires additional validation steps. Compared to short-read sequencing platforms, which benefit from well-established workflows, long-read data may require supplemental sequencing or computational validation to confirm results, particularly in regulatory-compliant environments such as clinical genomics or biopharma manufacturing applications.
Specialized Bioinformatics Tools and Expertise
The complexity of long-read data demands highly specialized bioinformatics tools and skilled personnel to perform tasks such as assembly, variant calling, and structural variant analysis. Many existing tools designed for short-read data are not optimized for long-read outputs, requiring new pipelines and often steep learning curves. This presents a resource challenge for biopharma organizations lacking in-house bioinformatics capabilities.
Risks and Limitations of Outsourcing
Outsourcing long-read NGS-based assays presents several strategic and operational challenges. It often requires a significant investment of time and resources and poses intellectual property (IP) risks that can delay project timelines. Analytical R&D teams may not have full access to raw sequencing data, limiting their ability to perform in-depth analyses or respond to unexpected findings, an issue that can complicate regulatory submissions. In addition, inconsistencies in assay standardization, transparency, and data quality across external providers can lead to unreliable decision-making. Communication gaps, logistical barriers, and challenges with data transfer protocols and quality control further impede efficiency, increasing overall project complexity and cost.
Genedata Selector Enables In-House Long-Read Sequencing Workflows
Genedata Selector enables biopharma companies to accelerate R&D timelines and increase ROI by streamlining the implementation of in-house long-read NGS-based assays. As the only sequencing platform-agnostic, off-the-shelf analysis solution for GMP-compliant NGS assays, Genedata Selector streamlines the analysis and management of long-read data, enabling the assessment of multiple CQAs, including biosafety, identity, and potency, in a single digital platform. Genedata Selector can integrate all types of sequencing and other omics data and ensures data quality and consistency, making NGS analysis traceable and reproducible.
By leveraging the capabilities of wizard-based Playbooks, Genedata Selector unlocks the potential of long-read sequencing as a MAM approach. Playbooks automate data registration and complex analysis workflows, providing scientists with an intuitive interface and just a few clicks to guide them through all the necessary steps for analyzing and interpreting long-read sequencing results. Additionally, Genedata Selector houses an extensive library of Playbooks covering a wide range of applications and modalities, ensuring comprehensive support for diverse biopharma R&D endeavors. The Playbooks available in the library allow R&D scientists to leverage the power of long-read sequencing for investigating various CQAs including:
- Gene Therapy QC: gene of interest truncation and fusion events, genomic variants, adventitious agent detection, gene expression profiles, vector identity, etc.
- Cell Therapy QC: genomic variants, differential gene expression profiles, single-cell gene expression profiles, plasmid contamination, adventitious agent detection, integration site analysis, copy number variants, etc.
- Master Cell Banks & Cell Line Characterization: gene expression profiles, genomic integrity & stability, product variants, adventitious agent detection, etc.
Genedata Selector comes equipped out-of-the-box with GMP functionalities, providing comprehensive sample history tracking and audit reports. The platform complies with the FDA's 21 CFR Part 11 regulation, ensuring the authenticity, integrity, and confidentiality of records for regulatory submissions. Furthermore, the platform provides scalable data storage and management capabilities, ensuring the seamless handling of large datasets across different sites. This helps to break down data silos and enhances communication and collaboration across teams. Genedata Selector supports enterprise-level scalability for cloud deployment, ensuring that biopharma R&D teams have the computational power needed for large-scale analyses and data interpretation.
Conclusion
Long-read sequencing is reshaping the biopharma industry by enabling more accurate, efficient, and comprehensive quality control throughout the R&D process. While an initial investment is required, in-house adoption is a strategic advantage as the long-term benefits include greater data control, reduced outsourcing risks, and faster, data-driven decision-making. By choosing Genedata Selector, biopharma organizations can cost-effectively harness the potential of long-read sequencing in-house, enhance data analysis, and streamline regulatory submissions, while accelerating R&D timelines and increasing ROI.
References
- Marx, V. (2022). Method of the Year 2022: long-read sequencing. Nature Methods.
- Wang, Y. (2021). Nanopore sequencing technology, bioinformatics and applications. Nature Biotechnology.
- Clappier, C. (2023). Deciphering integration loci of CHO manufacturing cell lines using long read nanopore sequencing. New Biotechnology.
- Pucker, B. (2022). Plant genome sequence assembly in the era of long reads: Progress, challenges and future directions. Quantitative Plant Biology.
- Hamim, I. (2022). How do emerging long-read sequencing technologies function in transforming the plant pathology research landscape? Plant Mol Biol.
- Oehler, J. B. (2023). The application of long-read sequencing in clinical settings. Human Genomics.
- Kobayashi, E. S. (2022). Approaches to long-read sequencing in a clinical setting to improve diagnostic rate. Scientific Reports.
- Logsdon, G. A. (2021). Long-read human genome sequencing and its applications. Nature Reviews Genetics.
- Sripada, S. A. (2024). Advances and opportunities in process analytical technologies for viral vector manufacturing. Biotechnology Advances.
- Hård, J. (2023). Long-read whole-genome analysis of human single cells. Nature Communications.
- Namkung, S. (2022). Direct ITR-to-ITR Nanopore Sequencing of AAV Vector Genomes. Human Gene Therapy.