Tribolium castaneum flour beetle gap genes are expressed as dynamic waves that move from posterior to anterior and are initiated by enhancers that switch between two modes of activity.
CAGE-seq and double DNA-FISH analyses demonstrate that miR-430 cluster facilitates early zygotic transcription from sharp promoters such as gata3 gene on chromosome 2, with single copy genes having promoter architecture similar to miR-430 escaping global repression to become active during minor wave of ZGA.
Background
Genomic studies, such as genome-wide association scans (GWASs), seek to identify variants that contribute to human disease by employing whole genome sequencing techniques to search for disease-associated variations across multiple populations. While GWAS approaches are powerful and have provided many new insights into complex disease causes, there are limitations; one such drawback being it can be challenging to pinpoint functional variants responsible for an association signal even within known regulatory regions; therefore various methods have been proposed that prioritize regions containing putative functional variants.
Though these methods can provide useful results, it is crucial to keep in mind the effects of genomic background on their performance. Functional variants with larger effects may be harder to distinguish from noise than variants with small effects; this can result in false positives if not carefully assessed. Therefore, testing them against controlled datasets with experimentally verified non-functional variants – something which proves challenging due to limited numbers available – is required for accurate evaluation of performance of such tools.
To address this challenge, we combined the HGMD and ENCODE pilot region background variant datasets. To create our training set from these datasets, we randomly sampled non-synonymous SNPs upstream of genes conserved between humans and mice genomes from HGMD; then this training set was spiked into RAVEN background variant dataset to assess SuRFR’s and other tools ability to prioritize functional variants.
At our institute, we compared the performance of GWAVA, CADD and FunSeq on two ClinVar pathogenic versus background variant sets using SuRFR as our test set tool of choice; all other tools performed less favorably on either set.
As part of our model evaluation, we conducted simulations based on the Enhancer Switching model which describes how dynamic enhancers can trigger expression waves in genes. We performed these simulations for pair-rule gene run and gap gene hbA activity and observed their expression patterns mirror those driven by dynamic enhancers – this is indicative of biological plausibility of our predictions as evidenced by exploring spatiotemporal dynamics of these enhancers in Tribolium castaneum flour beetle larvae.
Methods
Wave genome ra involves the analysis of each gene promoter region to detect variants that alter expression or are linked with obesity-associated phenotypes. Many approaches have been employed to narrow the set of candidate variants, including genome wide association studies (GWAS) and whole genome sequencing (WGS). Unfortunately, both approaches can produce false-positives, as they require large sample sizes for replication. Epigenomic studies may provide a more effective, targeted solution that is less time and cost intensive for smaller patient populations, including ATAC-seq and histone modification chromatin mapping. This type of investigation uses genome-wide data to assess variant impacts on gene expression levels in an clinical setting with relatively minimal cost implications.
Publicly available epigenomic data was utilized to identify chromatin states associated with gene activity. Chromatin opening, indicated by H3K4me3 and H3K27ac marks on histone H3, is linked with activation of cis-regulatory elements; ATAC-seq was then utilized to measure accessibility across nine cell lines; the predicted chromatin states were then predicted using multivariable logistic regression models.
SuRFR was evaluated against GWAVA, CADD and FunSeq on two ClinVar pathogenic and background datasets to assess its performance. SuRFR outshone all three tools with regards to both sensitivity and specificity – its predictions were later verified by several downstream genomic analyses.
As part of our analyses on miR-430 activation, the associated chromatin states were also analysed using both histone modifications and DNA sequence features. Our research demonstrated that miR-430 genes occupied chromatin states which were rich for TATA box and transcription initiation profiles as well as having longer transcription initiation profiles than other gene groups; additionally their core promoter architecture was compared with that of other gene groups; furthermore a subset of them had sharp TSS peaks (Figure 3A).
These data lend support to the hypothesis that differences in chromatin state and DNA sequence feature may contribute to the formation of minor wave gene clusters, suggesting the BRD4 histone methyltransferase is one factor responsible, along with pioneer transcription factors like Nanog, Pou5f3 and Sox19b which might cause them. Additionally, their activation patterns appear dependent on BRD4 himself or on pioneer transcription factors like Nanog Pou5f3 and Sox19b as potential mediators.
Results
Recent genome-wide association studies (GWASs) for obesity has yielded four susceptibility genes that may form the foundation of future therapeutic interventions to address this complex disorder. Clarifying their regulatory architecture and DNA sequence motifs underlying gene expression patterns will be key in their successful translation into clinical practice.
A genome-wide association study (GWAS), using BMI as the continuous trait, and focused on individuals who were most severely obese was used to identify the first wave gene. After satisfying rigorous replication requirements, this analysis indicated four obesity susceptibility loci.
These loci possess a characteristic promoter structure and DNA sequence motif containing a TATA box and sharp transcription start site profile, suggesting they escape global repression through promoter-autonomous mechanisms. Their sharp TSS core promoter architecture differs significantly from broad promoter znfs expressed during major waves of ZGA expression; perhaps taking advantage of favorable chromatin environments at their activation sites.
Even though these genes contain high GC contents, their promoter densities tend to be relatively low and show minimal interaction with chromatin remodeling factors like H3K4me3, H3K27me3 or H3K27ac histone marks. Furthermore, their promoter regions often do not contain known cis-regulatory elements and contain few binding motifs for known enhancers.
To identify potential regulatory sequence motifs and DNA sequences that contribute to the common promoter characteristics of these genes, we utilized publicly available epigenomic data sets representing chromatin opening (Assay for Transposase-Accessible Chromatin [ATAC-seq]) and cis-regulatory element regulation (histone mark data from ENCODE pilot project). We combined these chromatin states with genomic features (such as gene density, non-exonic conservation scores and exon skipping) to develop an unsupervised classifier of functional variants.
The classifier was trained using nine cell line data sets containing chromatin state and genomic feature datasets with multivariable logistic regression for training and validation, then tested on independent genotyped samples from 1000 Genomes EUR population, producing an impressive performance of 0.89 with excellent ability to discriminate functional from background variants.
Conclusions
Results presented here show that wave genome ra is capable of accurately predicting functional regulatory polymorphisms and can help inform experimental designs. It serves as an essential complement to GWAS, which traditionally relies on replication studies to identify variants with biological impact; such studies can be costly, labor intensive and may generate false positives; in comparison, wave genome ra is an easier and cost-effective method of discovering novel genomic variants with potential functional impacts.
Wave genome ra is an automated tool for interpreting association results using gene set enrichment analysis and can be tailored to fit different types of data sets, making it an invaluable resource for researchers seeking to discover novel regulatory variants and understand their effects on complex diseases like obesity.
In order to assess the selection pressure on SARS-CoV-2 spike protein variants, we conducted molecular phylogenetic analyses on the most prevalent SARS-CoV-2 genotypes found in wave-1 and wave-2 samples from Bangladesh. We discovered that mutations within their receptor binding domain (RBD), specifically mutations SN501Y, SK417N and SE484K made these variants more infectious by increasing their ability to escape host immune system defenses and spread among humans.
We investigated the effect of mutations by performing molecular docking between human ACE2 receptor and reference protein B.1.1.7 as well as two of the most prevalent spike protein variants B1.351 and B1.617, two most prevalent spike variants that exhibit higher binding affinity than their reference counterpart at physiological temperature and pH conditions. We noted that most popular variants displayed higher binding affinity compared to their reference counterpart.
Finally, we analyzed the effect of mutations on the folding behavior of spike protein by conducting structural simulations with COMPASS model. Notably, RMSD graphs revealed that the reference protein folded steadily while B.1.617 variant underwent periodic unfolding events while SASA graphs demonstrated reduced water solubility – suggesting these mutations may have had significant ramifications on stability or folding behaviour of spike proteins.