hyphy-analyses icon indicating copy to clipboard operation
hyphy-analyses copied to clipboard

Output of FitMG94

Open mlosilla opened this issue 3 years ago • 2 comments

Hi,

I am trying to understand the output of the FitMG94 model with per branch calculation (--type local).

For example, the output for one node is:

"Node20":{ "Confidence Intervals":{ "LB":0.05288894421311353, "MLE":0.1201576050549771, "UB":0.2263100014412389 }, "Nucleotide GTR":0.04948062317408397, "Standard MG94":0.05264881492703954, "dN":0.01794621524493222, "dS":0.1586217234027807, "nonsynonymous":0.01351914523388944, "original name":"Node20", "synonymous":0.03912966969315013

Is the following correct:

  1. MLE is the estimation of dN/dS = w, and it has a confidence interval with lower bound (LB) and upper bound (UB) limits, estimated by "profile likelihood" (I found this phrase in another post)

  2. "synonymous" and "nonsynonymous" are the values used in the synonymous and non-synonymous trees. These are the number of synonymous and non-synonymous substitutions per codon, and also the branch lengths. Could the total branch length be computed as "synonymous" + "nonsynonymous"?

  3. dN and dS are the number of [non]-synonymous substitutions divided by the number of codons that display [non]-synonymous substitutions in the alignment ???

  4. Is w (MLE) calculated (or very closely approximated) by dS/dN?

Thanks Mau

mlosilla avatar Apr 07 '21 23:04 mlosilla

Dear @mlosilla,

  1. Correct. ω is estimated directly (i.e. not dS and dN separately; the ratio is estimated as a model parameter)
  2. Yes.
  3. No -- as "synonymous subs" / expected synonymous sites (same for non-syn). More complete details are given on page 14 of http://www.hyphy.org/resources/hyphybook2007.pdf
  4. Approximated; dS/dN is not quite the same as &omega. For your example,. dN/dS = 0.113138445730805, and a direct estimate of ω = 0.1201576050549771. Close, but not the same. Spencer Muse had a really good paper on it close to 25 years ago. Sadly it is not well known. https://academic.oup.com/mbe/article/13/1/105/1055486

@SVMuse reads these, boards once in a while, so maybe he can chime in.

Best, Sergei

spond avatar Apr 08 '21 00:04 spond

Hi Sergei,

Thank you for your reply and links, and it is much clearer now. A couple of follow-ups:

  1. My goal with these data is to make a figure of my phylogeny with branch lengths: a) proportional to either "non-synonymous" or "non-synonymous" + "synonymous", I haven't decided which, and b) color-coded with a heatmap of the dN/dS ratios (w).

For 1b) the correct value would be the MLE right?

  1. some MLE estimates are very high, probably due to a lack or almost lack of synonymous substitutions. How are those best interpreted?

  2. more of a theoretical question: How does the taxonomic breadth of the phylogeny influence the w estimates? Does the Inclusion of more distantly related clades usually tends to affect dN and dS differently?

Thanks Mau

mlosilla avatar Apr 08 '21 13:04 mlosilla