We used real array- and karyotyping kit-based validation to assess the reproducibility of six popular CNV callers such as BIC-seq2, Canvas, CNVnator, HMMcopy, FREEC and QDNAseq.
Jaccard indexes of within-WGS and WGS vs WES comparisons demonstrated that differences among callers were driven mainly by platform, with results also affected by sequencing center.
High-throughput sequencing
High-throughput sequencing enables fast and cost-effective detection of genetic variants, including CNVs. When compared with traditional array-based methods, high-throughput sequencing offers greater accuracy while being less constrained by sample size limitations. Furthermore, its comprehensive view of the genome makes it easier to identify causative disease variants. High-throughput sequencing can also be applied in various applications like monitoring chromosomal changes in human pluripotent stem cells or karyotyping formalin-fixed paraffin embedded (FFPE) tissues.
NGS tools have been created to detect CNVs from genomic data, but their performance varies significantly. To assess them, this study employed a new method for computing reproducibility of CNV call sets using Jaccard index comparison; this compares two call sets with one another on similarity metrics comparing genomic regions covered by each CNV individually. BIC-seq2 stood out as being particularly accurate at detecting both real and simulated data accurately with no restrictions placed upon its length of coverage by individual CNVs.
This study evaluated six popular read-depth based CNV detection algorithms: BIC-seq2, Canvas, CNVnator, FREEC, HMMcopy and QDNAseq. All were put through rigorous real-world array and karyotyping kit-based validation studies and ultra low coverage WGS simulation data tests in order to evaluate their ability to detect CNVs at various depths.
Surprisingly, all but GATK gCNV and CODEX2 called more deletions than duplications for CytoScan HD SNP-array and NA12878 Gold Standard samples; recall varied by 100 fold between tools; there was no clear pattern of recall from one to the next.
Comparing results among different tools requires taking into account the number of CNVs called for each dataset. This is particularly crucial when assessing an algorithm’s ability to call CNVs in the sex chromosomes, which may have clinical relevance – for instance CNVs have been linked with various conditions including heart disease and cancer; according to this research BIC-seq2 was identified as being most suitable at calling these CNVs in its results.
Accurate CNV calling
CNVs, somatic genetic mutations that alter chromosome copy numbers, play an essential role in tumor development and progression, making their accurate identification critical. Numerous tools have been created to accurately call CNVs from NGS data; typically using read-pair, read-depth or split read sequencing technologies which use overlapping reads to detect gains, losses or LOH mutations; this allows an objective evaluation of CNV calling tools capabilities.
The Jaccard index measures the reproducibility of CNV calling tools by measuring similarity between call sets. This measure serves as an efficient means for comparing CNVs called by multiple individuals at various sequencing centers or platforms. To evaluate the performance of six CNV calling tools, they were applied to 21 WGS replicates sequenced at different sequencing centers (Novartis, UC Berkeley, EMBL-HCC, Fudan University EA and Loma Linda University) and sequencing platforms (Illumina HiSeq and Bionano). Consensus CNV call sets were generated for each replicate and compared with calls generated using orthogonal methods. CNVs were then classified according to whether they received strong, medium or weak evidence from multiple technologies.
There are multiple factors that impact the accuracy of CNV detection tools, including DNA input volume and sample preparation method. One such factor is deletions/duplications counts called by each tool – for instance, GATK gCNV, CODEX2, and CLC Genomics Workbench tend to call more deletions than duplications in both WES and WGS samples, for instance. Also, CNV size may differ significantly by tool; generally speaking those that employ read depth approaches perform better but even when this approach is sufficient recall may vary by 100-fold!
Tumor purity and cell fixing time also play a key role in CNV calling accuracy, with several studies investigating their influence on CNV identification in fresh and formalin-fixed paraffin-embedded (FFPE) samples. For this research study, the authors employed an original dataset featuring various tumor and normal gDNA mixtures with varied read coverage levels in order to examine their influence on performance of CNV calling tools.
Integrated library prep workflow
Library preparation for Next-Generation Sequencing workflows can be a labor-intensive and time-consuming task, taking up to two days. Variability and contamination often make achieving optimal sequencing results impossible; leading to inaccurate results that skew your research data and undermine its credibility. To mitigate these problems, automation library preparation as much as possible is highly recommended: whether performing whole exome sequencing or targeted cancer panel, there are numerous solutions that will fit the bill for any application.
Traditional DNA library preparation involves fragmentation of DNA molecules into different sizes, end repair/A-tailing, and the attachment of platform-specific adapters to libraries. Unfortunately, many of these steps must be performed manually which often leads to considerable variations between samples due to pipetting variation between pipes; even experienced scientists often experience this problem due to differences between pipetting efforts that affect quality of final libraries. One solution for improving library prep for sequencing involves automating this entire process from extraction of DNA through final sequencing libraries with an automated system; these systems offer significant time savings over manual approaches!
These systems are also well suited to high-throughput applications like genomic profiling and clinical diagnostics, with built-in quality control solutions providing users with an array of standardized quality measures to verify the integrity of their libraries, such as heuristics for identifying systematic artifacts, titration studies of intact and FFPE DNA for input optimization purposes, orthogonal sequencing strategies for greater detection sensitivity etc.
The RNA-Seq library prep workflow facilitates high-throughput genomic analysis for various applications, such as whole genome sequencing (WGS) and methylation mapping. Its efficient approach offers substantial cost savings and operational efficiencies to high-throughput laboratories; its scalable architecture also supports various sample types from fresh samples to formalin-fixed paraffin embedded (FFPE) tissue.
Reagents in these kits have been carefully chosen to work synergistically, optimizing library preparation for one platform. This allows users to save time by eliminating multiple reagent changes and cutting hands-on time significantly. Furthermore, the kits can easily integrate into existing workflows by being compatible with automated liquid handling systems.
Fast turnaround time
WGS allows researchers to rapidly uncover new variants at each base level and pinpoint causative variations that contribute to disease, providing a much broader view than arrays alone can. WGS can detect small deletions and duplications which might not have been picked up by microarrays or karyotypes alone – helping scientists better understand disease mechanisms as well as devise more effective therapies.
Genome-wide CNVs can be difficult to detect at low coverage levels due to low allele frequencies that overlap with other regions on chromosomes. A variety of software packages have been created to detect CNVs at different coverage and resolution levels; it is important that researchers consider all approaches prior to selecting one that will best serve their research project.
To evaluate the performance of six popular read-depth based CNV detection algorithms (BIC-seq2, Canvas, FREEC, HMMcopy and QDNAseq) used for CNV detection (BIC-seq2, Canvas, FREEC, HMMcopy and QDNAseq), they were evaluated on simulated WGS data. Genomic maps (Fig 2) display predicted and ground truth CNVs for each sample; all except HMMcopy detected CNVs across all autosomes whereas BIC-seq2 and Canvas produced false positives while HMMcopy failed to detect a large 1 Mbp duplication in Chrom X.
Genome-wide CNVs can be used to assess patient prognosis and whether or not they qualify for certain treatments, known as molecular karyotype analysis. This process serves as the cornerstone of personalized medicine and also allows doctors to pinpoint the source of disease.
WES uses targeted sequencing assays to identify single nucleotide variants (SNVs), insertions, deletions and copy number variations (CNVs) within protein-coding genes. It can be used both to diagnose rare diseases as well as study the genetic makeup of populations.
WES involves enriching exonic regions through hybrid capture or target-specific amplification methods, followed by high-throughput sequencing on an array and sequencing platform. Using this technique can identify disease-causing mutations (including CNVs ) present in patients suffering from complex disorders, as well as verify cytogenomic findings obtained through whole genome sequencing (WGS).