hyphy-analyses
hyphy-analyses copied to clipboard
Output of FitMG94
Hi,
I am trying to understand the output of the FitMG94 model with per branch calculation (--type local).
For example, the output for one node is:
"Node20":{ "Confidence Intervals":{ "LB":0.05288894421311353, "MLE":0.1201576050549771, "UB":0.2263100014412389 }, "Nucleotide GTR":0.04948062317408397, "Standard MG94":0.05264881492703954, "dN":0.01794621524493222, "dS":0.1586217234027807, "nonsynonymous":0.01351914523388944, "original name":"Node20", "synonymous":0.03912966969315013
Is the following correct:
-
MLE is the estimation of dN/dS = w, and it has a confidence interval with lower bound (LB) and upper bound (UB) limits, estimated by "profile likelihood" (I found this phrase in another post)
-
"synonymous" and "nonsynonymous" are the values used in the synonymous and non-synonymous trees. These are the number of synonymous and non-synonymous substitutions per codon, and also the branch lengths. Could the total branch length be computed as "synonymous" + "nonsynonymous"?
-
dN and dS are the number of [non]-synonymous substitutions divided by the number of codons that display [non]-synonymous substitutions in the alignment ???
-
Is w (MLE) calculated (or very closely approximated) by dS/dN?
Thanks Mau
Dear @mlosilla,
- Correct. ω is estimated directly (i.e. not dS and dN separately; the ratio is estimated as a model parameter)
- Yes.
- No -- as "synonymous subs" / expected synonymous sites (same for non-syn). More complete details are given on page 14 of http://www.hyphy.org/resources/hyphybook2007.pdf
- Approximated; dS/dN is not quite the same as &omega. For your example,. dN/dS = 0.113138445730805, and a direct estimate of ω = 0.1201576050549771. Close, but not the same. Spencer Muse had a really good paper on it close to 25 years ago. Sadly it is not well known. https://academic.oup.com/mbe/article/13/1/105/1055486
@SVMuse reads these, boards once in a while, so maybe he can chime in.
Best, Sergei
Hi Sergei,
Thank you for your reply and links, and it is much clearer now. A couple of follow-ups:
- My goal with these data is to make a figure of my phylogeny with branch lengths: a) proportional to either "non-synonymous" or "non-synonymous" + "synonymous", I haven't decided which, and b) color-coded with a heatmap of the dN/dS ratios (w).
For 1b) the correct value would be the MLE right?
-
some MLE estimates are very high, probably due to a lack or almost lack of synonymous substitutions. How are those best interpreted?
-
more of a theoretical question: How does the taxonomic breadth of the phylogeny influence the w estimates? Does the Inclusion of more distantly related clades usually tends to affect dN and dS differently?
Thanks Mau