Graphical Differences Between QNets of Coronavius and Influenza
QNet is a collection of decision trees, describing the dependency between different amino acids along the protein sequence of a virus (Li2020). Each tree yields the probability distribution of the animo acid at a certain position, given the protein sequence. By attaching all trees together, we obtain complex networks as in the figure above. We notice the visual differences between the networks of coronavirus and influenza. For example, the network for influenza have few central nodes with many connections, while the network for coronavirus have more nodes with high connections. These speculations are vague and untested, but if they were true, this knowledge could help us distinguish highly infectious virus, like coronavirus, from a common influenza.
Statistical Differences in the Degree Distribution
By fitting the power-law model on the degree distribution of QNets, we observed the statistical difference on the value of alpha, the scaling parameter, between QNets built from coronavirus and from influenza. The estimation procedure follows section 3 of Clauset2009. Later, I should compare the power-law model with alternative hypotheses such as binomial distribution.
Even though we show the difference in the scaling parameters, it is still not clear if it is due to
- the difference between controlled disease like influenza and epidemic disease lkike coronavirus
- Biases from the creation of QNet because of the difference in the number of data. For coronavirus, the number of nodes is around 3000, while for influenza, it is only around 100-300. Perhaps, I should sample only 300 data points from coronavirus.
Measuring the scaling parameters over yearly data of H1N1
Yearly data of H1N1 show significant fluctuations in the scaling parameters, especially during the year 2006. However, H1N1 became an epidemic during 2008-2009, and the scaling parameters stayed close to 2 during that time.
Measuring the effect of the number of data on the scaling parameters
I plan to plot alpha vs the number of data N on coronavirus data. I would like to see if the scaling parameter is dependent on the number of data.
