msmc2
msmc2 copied to clipboard
Converting time boundaries into years
Hello,
I am using MSMC2 to infer split times between several populations, and when it comes to converting the time boundaries output by MSMC2 into years I see that there are two formulas available:
- left_time_boundary/mu*gen (in the msmc2 tutorial here https://github.com/stschiff/msmc-tools/blob/master/msmc-tutorial/guide.md)
- gen * ((left_time_boundary + right_time_boundary)/2) / mu (in the book chapter in "Statistical population genomics")
As far as I understand the difference is that with the second formula the time of a given segment is placed at its middle while with the first one it is placed at the left boundary. However, both result in fairly different estimates in my case and I'm wondering whether I should prefer one or the other.
Thanks in advance, Loïs
Hmm, I'm not sure I understand the contexts of the two quotes above. I can't remember what's exactly in the book, but I guess the way to plot curves and estimate split times goes like this:
- Determine the cross-coalescence rate as a continuous but step-wise function, as done by the tool
combineCrossCoal.pyfrom the MSMC-tools repo. - Look at this curve and check when it crosses the CCR=0.5 threshold.
That's it, right?
So basically, you should never convert a time-segment to a single time, but consider a single time-segment as an interval, which has a start and end point, and ultimately the coalescence rate functions as continuous, but step-wise functions through time.
Hi @stschiff,
Sorry, my message was unclear and things were not very clear in my head either. Indeed the time segments are intervals, but I'm confused about how to convert the segments' boundaries in the output of combineCrossCoal.py into years. I think the first formula, time_boundary/mu*gen makes sense and correspond to what you describe. However in the script associated with the book chapter (https://github.com/StatisticalPopulationGenomics/MSMCandMSMC2/blob/master/plot_msmc.py) the middle of the time segments are used to calculate the time at which the CCR crosses the 0.5 threshold, and I don't understand why. Sorry if I'm missing something obvious!
Those are minor differences. Just look at the scripts and judge whether the calculation makes sense to you. It's just linear interpolations. Not sure what to advice here.