COVID-19 infection: Origin, transmission, and characteristics of human
coronaviruses
@article{shereen2020covid,
title={COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses},
author={Shereen, Muhammad Adnan and Khan, Suliman and Kazmi, Abeer and Bashir, Nadia and Siddique, Rabeea},
journal={Journal of Advanced Research},
year={2020},
publisher={Elsevier}
}

The coronavirus disease 19 (COVID-19) is a highly transmittable and pathogenic viral infection caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which emerged in Wuhan, China and spread around the world. Genomic analysis revealed that SARS-CoV-2 is phylogenetically related to sev- ere acute respiratory syndrome-like (SARS-like) bat viruses, therefore bats could be the possible primary reservoir. The intermediate source of origin and transfer to humans is not known, however, the rapid human to human transfer has been confirmed widely.
The genome of the SARS-CoV-2 has been reported over 80% identical to the previous human coronavirus (SARS-like bat CoV) [34].
The spike glycoprotein of SARS-CoV-2 is the mixture of bat SARS-CoV and a not known Beta-CoV [38]. In a fluorescent study, it was confirmed that the SARS-CoV-2 also uses the same ACE2 (angiotensin-converting enzyme 2) cell receptor and mechanism for the entry to host cell which is previously used by the SARS-CoV [39,40]. The single N501T mutation in SARS-CoV-2’s Spike protein may have signifi- cantly enhanced its binding affinity for ACE2 [33].
SARS-CoV has been reported to replicate and cause severe disease in Rats (F344), where the sequence analysis revealed a mutation at spike glycoprotein [45].
Murine Coronavirus with an Extended Host Range Uses Heparan Sulfate as an Entry Receptor
@article{de2005murine,
title={Murine coronavirus with an extended host range uses heparan sulfate as an entry receptor},
author={De Haan, Cornelis AM and Li, Zhen and Te Lintelo, Eddie and Bosch, Berend Jan and Haijema, Bert Jan and Rottier, Peter JM},
journal={Journal of virology},
volume={79},
number={22},
pages={14451--14456},
year={2005},
publisher={Am Soc Microbiol}
}
Although MHV is critically dependent on murine CEACAM for cell entry and therefore only infects murine cells, MHV variants capable of infecting nonmurine cells were obtained from persistently infected cell cultures (2, 3, 31, 33). The viruses generated by Baric and coworkers (2) still used murine CEACAM as a receptor but were dependent on human CEACAM for entry into human cells. The receptor determinant of the MHV variant (MHV/BHK) generated by Sawicki and Schickli and coworkers (31, 33) has not been determined yet. Strikingly, this variant is no longer dependent on murine CEACAM for entry and appears to exhibit an even more extended host range, being able to infect cells from many different species (33). The MHV/BHK S protein (GenBank accession number AY497331 ) differs from the S protein of the parental MHV-A59 strain (GenBank accession number AY497328 ) at 57 residues and, additionally, contains a 7-amino-acid insert. Analysis of several viruses resulting from recombination between MHV-A59 and MHV/BHK demonstrated a correlation between 21 amino acid substitutions and the 7-amino-acid insert, all located in the S1 domain, with the extended host range (32). However, although introduction of these mutations into an isogenic background permitted MHV-A59 to interact with alternative receptors on murine and nonmurine cells, these viruses failed to induce a second round of infection in nonmurine cells under liquid medium, indicating that additional substitutions in S or mutations in other viral genes may be needed for efficient infection of these cells (35).
In addition, we demonstrated that the S gene of MHV/BHK is sufficient to confer the extended host range phenotype.
Our results show that only a relatively few mutations in the S protein can convert MHV from a virus that depends for its cell entry on a highly specific receptor to one than can utilize a relatively nonspecific moiety, heparan sulfate. Since these changes were rapidly acquired in persistently infected cell cul- tures, S gene mutations might also occur in persistently in- fected animals in tissues where low levels of the receptor are expressed. Such changes might contribute to interspecies trans- mission; hence, an increased understanding of this process is desirable. It is noteworthy that, for SARS-CoV as well, genetic variations in the S gene appear to be essential for the transition from a virus capable of animal-to-human transmission to a virus spreading from human to human (34), a transition that eventually caused the severe acute respiratory syndrome out- break.
Of Mice and men: the coronavirus MHV and mouse models
@article{korner2020mice,
title={Of Mice and men: the coronavirus MHV and mouse models as a translational approach to understand SARS-CoV-2},
author={K{\"o}rner, Robert W and Majjouti, Mohamed and Alcazar, Miguel A Alejandre and Mahabir, Esther},
journal={Viruses},
volume={12},
number={8},
pages={880},
year={2020},
publisher={Multidisciplinary Digital Publishing Institute}
}
As with SARS-CoV, SARS-CoV-2 was first described in persons who were exposed to a live-animal market in China [15]. In the case of SARS-CoV SARS-like coronaviruses were isolated from Himalayan palm civets (Paguma larvata) and a raccoon dog (Nyctereutes procyonoides) from live-animal markets but not in the wild. It was suspected that civets and raccoon dogs served as intermediate hosts but bats were proposed to be the natural reservoir hosts of SARS-CoV. SARS-like coronaviruses with a broad genetic spectrum were isolated from Chinese horseshoe bats (Rhinolophus sinicus). SARS-CoV-2 is 96% identical at the whole genome level to a bat coronavirus (RaTG13) and shares 79.6% sequence identity to SARS-CoV [13]. Apart from the strong evidence that SARS-CoV-2 also originated in bats [10,13], it remains unclear how the bat-human transmission occurred. Pangolins have been considered as potential intermediate hosts [16]. Interestingly, SARS-CoV-2-related coronaviruses in pangolins show an 85.5% to 92.4% sequence similarity to SARS-CoV-2 at the whole genome level [16] and a 97.4% amino acid similarity in the RBD of SARS-CoV-2. However, the remainder of the genome of SARS-CoV-2 is more closely related to the bat coronavirus RaTG13 [16]. As with SARS-CoV, the RaTG13 and the examined pangolin coronaviruses also lack the furin-like cleavage site in the S protein. This polybasic cleavage site might have facilitated the rapid spread of SARS-CoV-2 in the human population [16]. After all, the exact route of transmission from natural reservoirs to humans remains speculative.
1. Genomic Classification Using an Information-Based Similarity Index: Application to the SARS Coronavirus
@article{goldberger2005genomic,
title={Genomic classification using an information-based similarity index: application to the SARS coronavirus},
author={Goldberger, Ary L and Peng, C-K},
journal={Journal of Computational Biology},
volume={12},
number={8},
pages={1103--1116},
year={2005},
publisher={Mary Ann Liebert, Inc. 2 Madison Avenue Larchmont, NY 10538 USA}
}
This is sars-cov analysis
Measures of genetic distance based on alignment methods are confined to studying sequences that are conserved and identifiable in all organisms under study. A number of alignment-free techniques based on either statistical linguistics or information theory have been developed to overcome the limitations of alignment methods. We present a novel alignment-free approach to measuring the similarity among genetic sequences that incorporates elements from both word rank order-frequency statistics and information theory. We first validate this method on the human influenza A viral genomes as well as on the human mitochondrial DNA database. We then apply the method to study the origin of the SARS coronavirus. We find that the majority of the SARS genome is most closely related to group 1 coronaviruses, with smaller regions of matches to sequences from groups 2 and 3. The information based similarity index provides a new tool to measure the similarity between datasets based on their information content and may have a wide range of applications in the large-scale analysis of genomic databases.
The outbreak of SARS in 2003 has had a tremendous impact on worldwide health care systems (Lee et al., 2003; Poutanen et al., 2003). A central question relevant to the prevention of the recurrence of future SARS outbreak is to determine the virus’s origin. Several groups have contributed to identifying and sequencing the complete genome of the newly recognized pathogen, SARS-associated coronavirus (SARS-CoV) (Rota et al., 2003; Marra et al., 2003; Drosten et al., 2003; Peiris et al., 2003). A SARS-likevirus has also been isolated from wild animals such as the palm civet in southern China, indicating that SARS-CoV may have originated from a previously unidentified animal coronavirus (Guan et al., 2003). The relationship of the poorly conserved SARS genome to other coronaviruses, however, is still in question (Vogel, 2003; Enserink and Normile, 2003; Enserink, 2003) since current studies are based on the small portion of aligned sequences (Rota et al., 2003; Eickmann et al., 2003; Marra et al., 2003; Snijder, et al., 2003; Stadler et al., 2003).
However, other sequences including those coding for Nsp 3 and 12 and the S protein show combined origins from other groups. For example, half of S1 domain of the S protein is partly related to the MHV, whereas the remaining sequence is related to HCoV-229E. (note MHV is murive hep)
Integrating genotypes and phenotypes improves long-term forecasts of seasonal influenza A/H3N2 evolution
@article{huddleston2020integrating,
title={Integrating genotypes and phenotypes improves long-term forecasts of seasonal influenza A/H3N2 evolution},
author={Huddleston, John and Barnes, John R and Rowe, Thomas and Xu, Xiyan and Kondor, Rebecca and Wentworth, David E and Whittaker, Lynne and Ermetal, Burcu and Daniels, Rodney Stuart and McCauley, John W and others},
journal={Elife},
volume={9},
pages={e60067},
year={2020},
publisher={eLife Sciences Publications Limited}
}
Despite the promise of these sequence-only models, they explicitly omit experimental measure- ments of antigenic or functional phenotypes. Recent developments in computational methods and influenza virology have made it feasible to integrate these important metrics of influenza fitness into a single predictive model. For example, phenotypic measurements of antigenic drift are now accessi- ble through phylogenetic models (Neher et al., 2016) and functional phenotypes for HA are avail- able from deep mutational scanning (DMS) experiments (Lee et al., 2018). We describe an approach to integrate previously disparate sequence-only models of influenza evolution with high- quality experimental measurements of antigenic drift and functional constraint.
Current vaccine predictions focus on the hemagglutinin (HA) protein, which acts as the primary target of human immunity. Until recently, the hemagglutination inhibition (HI) assay has been the pri- mary experimental measure of antigenic cross-reactivity between pairs of circulating viruses (Hirst, 1943). Most modern H3N2 strains carry a glycosylation motif that reduces their binding effi- ciency in HI assays (Chambers et al., 2015; Zost et al., 2017), prompting the increased use of virus neutralization assays including the neutralization-based focus reduction assay (FRA) (Okuno et al., 1990). Together, these two assays are the gold standard in virus antigenic characterizations for vac- cine strain selection, but they are laborious and low-throughput compared to genome sequencing (Wood et al., 2012). As a result, researchers have developed computational methods to predict influenza evolution from sequence data alone (Luksza and Lässig, 2014; Steinbrück et al., 2014; Neher et al., 2014).
Are DMS values publicaly available for sequences?
clades issue
I dont belive this is an issue. Clades only help in prediction. One is saying that there are subgroups A and B, and A' is more likely from A and B' from B. So what? We can incorporate these to make better predictions though
- We estimated viral fitness with biologically-informed metrics including those originally defined by Luksza and Lässig, 2014 of epitope antigenic novelty and mutational load (non-epitope mutations) as well as four more recent metrics including hemagglutination inhibition (HI) antigenic novelty (Neher et al., 2016), deep mutational scanning (DMS) mutational effects (Lee et al., 2018), local branching index (LBI) (Neher et al., 2014), and change in clade frequency over time (delta fre- quency) (Table 1).
All of these metrics except for HI antigenic novelty and DMS mutational effects rely only on HA sequences. The antigenic novelty metrics estimate how antigenically distinct each strain at time t is from previously circulating strains based on either genetic distance at epitope sites or log 2 titer distance from HI measurements.
Increased antigenic drift relative to previously circulating strains is expected to correspond to increased viral fitness.
- Mutational load estimates functional constraint by measuring the number of putatively deleterious mutations that have accumulated in each strain since their ancestor in the previous season.
- DMS mutational effects provide a more comprehensive biophysical model of functional constraint by measuring the beneficial or deleterious effect of each possible single amino acid mutation in HA from the background of a previous vaccine strain, A/Perth/16/2009.
- The growth metrics estimate how successful populations of strains have been in the last six months based on either rapid branching in the phylogeny (LBI) or the change in clade frequencies over time (delta frequency).
- We fit models for individual fitness metrics and combinations of metrics that we anticipated would be mutually beneficial. For each model, we learned coefficient(s) that minimized the earth mover’s distance between HA amino acid sequences from the observed population one year in the future and the estimated population produced by the fitness model (Equation 2).
How are future strains being calculated? Not clear
- it seems that whole strains must already exist. (of course NOONE can predict a new strain), and then apply growth dynamics
Still measure closeness in AA metric
As expected, the true fitness model outperformed all other models, estimating a future popula- tion within 6.82 ± 1.52 amino acids (AAs) of the observed future and surpassing the naive model in 32 (97%) of 33 timepoints (Figure 4, Table 4). Although the true fitness model performed better than the naive model’s average distance of 8.97 ± 1.35 AAs, it did not reach the closest possible dis- tance between populations of 4.57 ± 0.61 AAs.
Revision points
- Maximum likelihood phylogenies
- Other genetic distances
- SARS2 writeup in context
- have last years dominant strain as prediction (naive baseline)
- impact of clades
- impact of misalignment
- impact of indels
- IRAT
ML phylogeny
http://ib.berkeley.edu/courses/ib200a/lect/ib200a_lect11_Will_likelihood.pdf
Among-Site Rate Variation (Γ) The starting hypothesis is that all sites are assumed to have equal rates of substitution. This assumption can be relaxed, allowing rates to differ across sites by having rates drawn from a gamma distribution. The gamma is useful as its shape parameter (ά) has a strong influence on the values in the distribution.
Choosing a model: As you might imagine, there are many models already available (ModelTest discussed below looks at 56!!) and an effectively infinite number are possible. How can one choose? The program ModelTest (Posada & Crandal 1998) uses log likelihood scores to establish the model that best fits the data. Goodness of fit is tested using the likelihood ratio score.
Some typical models.
- JC, Jukes & Cantor (1969): all substitutions are equal and all base frequencies are equal. Most restrictive.
- F81, Felsenstein (1981): all substitutions are equal, base frequencies can vary.
- K2P, Kimura 2 parameter, Kimura (1980): Transitions and transversions have different substitution rates, base frequencies are assumed equal.
- HKY85, Hasegawa-Kishino-Yano (1985): Transitions and transversions have different substitution rates, base frequencies can vary.
- GTR: General Time Reversible (Lanave et al. 1984) Six classes of substitutions, base frequencies vary.
Supposed advantages.
- Appropriate for simple data like DNA sequences, where we can reasonably model the largely stochastic processes, i.e. a statistical description of the stochastic processes.
- lower variance than other methods (i.e. estimation method least affected by sampling error)
- robust to many violations of the assumptions in the evolutionary model, even with very short sequences it may outperform alternative methods such as parsimony or distance methods.
- the method is statistically well understood
- has explicit model of evolution that you can make fit the data
- evaluate different tree topologies (vs. NJ)
- use all the sequence information (vs. Distance)
- better accounting for branch lengths, e.g. incorporates “multiple hits” thereby providing more realistic branch length and reducing the region of LBA. Also, information is derived from sites that would be uninformative under parsimony.
Supposed disadvantages.
- very computationally intensive and so slow (though this is becoming much less of an issue)
- Apparently susceptible to asymmetrical presence of data in partitions (see Simmons, M.P., 2011. Misleading results of likelihood-based phylogenetic analyses in the presence of missing data. Cladistics. 27:1-15.)
- the result is dependent on the model used and information is derived from sites that are uninformative under parsimony is only due to the model used.
- questionably applicable to complex data like morphology given the difficulty of modeling the numerous processes
- philosophically less well established, especially in terms the applicability of probabilities and statistical measures of unique historical events (vs. Parsimony as a general principle). This is a fundamental distinction between reconstruction and estimation, e.g. “Although the true phylogeny maybe “unknowable” it can nonetheless be estimated…” Phylogenetic Inference", Swofford, Olsen, Waddell, and Hillis, in Molecular Systematics, 2nd ed., Sinauer Ass., Inc., 1996, Ch. 11.
@article{simmons2012misleading,
title={Misleading results of likelihood-based phylogenetic analyses in the presence of missing data},
author={Simmons, Mark P},
journal={Cladistics},
volume={28},
number={2},
pages={208--222},
year={2012},
publisher={Wiley Online Library}
}
IRAT
https://www.cdc.gov/flu/pdf/pandemic-resources/cd-irat-validation-report.pdf
Coding
- make table/csv comparing naive baseline (predict last season's strain)
- add IRAT results
- correct loglikelihood values? should we do it?
- redraw phylogeny in star mode to show that rats are further apart
