Genome sequencing has grown increasingly popular, and many major companies are developing and releasing new genome sequencers – Oxford Nanopore Technologies’ PromethION and MinION systems are receiving upgrades with improved base calling algorithms.
Recently, a wavy pattern in array-CGH data was identified as a genomic wave (3) and found to impede accurate CNV detection.
Detection of SNPs
Genomic wave approaches use an empirical model that takes into account both the length and GC content surrounding an SNP as well as adjusting signal intensity relative to nearby regions; adjusted SNPs are then ranked according to their GC score and position on a genome-wide chromatogram.
After merging GC-adjusted SNPs with genomic wave-adjusted peak intensities, they are combined into a list of candidate genes and variants with genomic wave-adjusted signal intensities from a reference genome for comparison; any false positives and false negatives identified during this step are then used to make final determination regarding pathogenicity of each SNP.
Genomic wave whole genome sequencing is currently undergoing clinical trials and has shown promising results in terms of diagnostic yield. Researchers discovered that it could detect most disease-causing mutations among sampled individuals and was comparable to traditional exome or gene panel sequencing; however, further improvements must be made to increase diagnosed patient numbers while decreasing nondiagnostic tests required.
One significant drawback of genomic wave sequencing is its limited application; only certain samples can be processed at once and it requires considerable time and patience for completion. Still, authors believe genomic wave sequencing to be an invaluable way of diagnosing complex diseases and should serve as a supplement to current approaches.
Methods have been devised for evaluating SNP signal intensities on Illumina arrays. Of these methods, one of the most frequently utilized is log R ratio (LRR) analysis, which serves as a normalized measure of total SNP signal intensity; LRR values can be calculated as log2(Robserved/Rexpected).
Genomic evolutionary rate profiling (GERP) is another method for detecting SNPs. This method assesses position-specific evolutionary rates and identifies constraint candidates using their rejection substitution scores; additionally, GERP can identify methylation sites expected to disrupt expression of a gene.
Detection of CNVs
CNVs (colinear variants of DNA segments) refers to genomic changes that differ from their respective reference genome sequence. Identification of CNVs can be a challenging task for next-generation sequencing (NGS) platforms due to the large volume of reads that must be processed to accurately identify genomic variation. Discovering CNVs using NGS data requires various approaches, from segmentation approaches and clustering algorithms based on specific genomic features, to segmentation approaches such as CNV-RF algorithm or random forest approach. There are numerous NGS-based segmentation approaches developed specifically to this challenge, such as CNV-RF or random forest approaches. These methods divide the genome into non-overlapping windows with differing lengths; each window represents an area encompassing specific chromosomal coordinates. This method uses an RDN signal computed for each window, with segmentation thresholds being determined using upper-tail and lower-tail probabilities of CNV detection in that region based on upper and lower tail probabilities, taking into account false positive rates across the genome. Such approaches have proven themselves as robust against noise and mapping artifacts while simultaneously improving sensitivity over traditional aCGH approaches.
However, it should be remembered that NGS-based CNV detection can be limited by mapping quality and sequencing coverage; thus improved segmentation methods that take advantage of different genomic features could potentially increase sensitivity and accuracy for NGS-based CNV detection.
Numerous NGS-based CNV identification tools have been designed to detect CNVs in tumor/normal tissue samples. These tools can detect CNVs based on different genomic features, including mapping read depth, genomic location, ploidy level and more. In addition, these tools compare chromosomes between tumor and normal tissues to detect CNVs that only occur within tumor tissue samples.
Manta Structural Variant Caller can detect somatic CNVs in NGS data based on pair-end mapping and split-read evidence, and identify CNVs/SVs within VCF files created from tumor or germline cells RNA-seq data.
Genovar is another useful tool for identifying CNVs in NGS data, as it compares detected CNVs against variants found in the Database of Genomic Variants (DGV). This comparison can help identify new mutations not yet featured within DGV. Furthermore, Genovar can be used to visualize genomic source data such as aCGH or sequence alignment data.
Detection of Variants
Structural variants (SVs) are an integral component of genetic diversity and can lead to various diseases and phenotypes. Due to their intricate nature, however, SVs are difficult to detect via genomic sequencing due to multiple factors including sequencing platform used, library preparation strategies used, variant calling algorithms etc. Accuracy in SV detection is therefore key for clinical application of whole genome sequencing.
Comparative to traditional PCR, DNA-sequencing techniques offer greater coverage of the human genome, which allows for identification of more SNPs and CNVs than through traditional methods alone. It is important to keep in mind, however, that SNP and CNV detection will depend on both size and quality of sample; hence, prior to interpreting results it is vitally important that quality assessments of sequencing data be completed before drawing any definitive conclusions from it.
Genome sequences are typically preprocessed to increase efficiency of analysis by aligning reads against a reference genome, using variant callers and filtering tools, then filtering. Unfortunately, even with these improvements in place, detection of true pathogenic mutations is sometimes hindered due to low sequencing coverage or an abundance of nonpathogenic variants; additionally de novo mutations found within tumor tissues may complicate interpretation and lead to false-positive results.
Next-generation sequencing platforms generate unique variants for every sample, making it difficult to use traditional variant caller quality control and annotation procedures. However, several open-source tools exist that can assist in this endeavor, including ANNOVAR which supports genomic wave sequencing workflow.
WISARD is another useful tool for analyzing genomic wave sequencing data, providing a robust yet intuitive analysis of genome-wide association studies and variant comparison across samples. It also has an intuitive user interface for viewing variants across samples.
WaveCNV is another powerful CNV detection tool, using translation-invariant discrete wavelet transforms to detect copy number variations in next-generation sequencing data and detect small copy number variants (CNVs). It is especially adept at detecting smaller CNVs.
Analysis of Variants
Genetic variation lies at the core of many diseases, and genome sequencing (GS) has been demonstrated to significantly increase molecular diagnostic yield in complex clinical presentations (including those with disputed diagnoses ) (reviewed here). When compared with exome sequencing, genome sequencing allows more comprehensive detection of both coding- and non-coding variants, structural variants and short tandem repeats (STRs), providing superior sensitivity for rare disease detection.
Next-generation sequencing differs from conventional microarrays in that each sample generates its own variant call. Unfortunately, sorting through all these variants to identify harmful or clinically significant variants can be dauntingly time consuming and complex on an exome or genome level. SVS supports various quality assurance and variant filtering tools designed specifically to make this task simpler, such as read depth-based variant calling, annotation track filters based on public catalogs and variant classification based on impact on gene function.
Once a sample has been sequenced, FASTQ files are mapped back onto the reference genome using BWA and Qualimap, with variants called in VCF file by SVS for analysis and quality control purposes. This variant calling pipeline provides users with faster and easier quality control and variant interpretation on NGS data.
SVS provides several filtering options for coding variants based on read depth, annotation tracks and their impact on protein coding:
SVS provides several tools for structural variation analysis. This includes being able to detect short tandem repeats and chromosomal inversions; additionally, SVS can summarize all these results for easier identification of potential pathogenic mutations.
Genome sequencing has become a valuable way of detecting pathogenic variants that don’t appear on standard of care tests such as cytogenetic testing and RT-PCR, such as SARS-CoV-2 variants found during its first and second waves, thus helping guide vaccination strategies. Furthermore, in a recent study conducted using genome sequencing as the sole diagnostic modality, more than 20% of severe symptoms with no treatment received a diagnosis via genome sequencing alone; its authors conclude that genome sequencing (GS) offers promise in diagnosing difficult-to-diagnose disorders while it should become part of the Molecular Diagnostic Process.