msmc2 icon indicating copy to clipboard operation
msmc2 copied to clipboard

Question about phasing and the cross-coalescence rate

Open luisamarins opened this issue 2 years ago • 1 comments
trafficstars

Hello, I have a dataset of three genomes from individuals from three different populations. From the tutorial I read that for a single diploid genome as input (i.e., two haplotypes), no phasing is necessary. So I didnt worry about phasing for my "within population" runs.

I now want to estimate the coalescence rate across populations, to estimate the timing of the split between them. I have a few questions:

  • Is phasing needed in this case?

  • do I need to calculate the cross-coalescence rate for pairs of populations or is it possible to input the three genomes at once? below is my first attempt at running this, but I am not sure it makes sense for three populations or if I should do it pairwise: msmc2_Linux -t 20 -s -I 0-2,0-3,0-5,0-5,1-2,1-3,1-4,1-5,2-4,2-5,3-4,3-5 -o crosspop6hap-t3cat INPUT_LIST.txt

Your help is greatly appreciated :)

luisamarins avatar Mar 14 '23 13:03 luisamarins

Yes, in this case phasing is needed. I don't fully get your list of indices. You can certainly prepare your input file for three populations, but the estimation should focus on a pair of populations between which you want to compute the cross-coalescence rate.

So if you have indices 0,1 for pop1; 2,3 for pop2 and 4,5 for pop3, then you could run

  • 0-2,0-3,1-2,1-3 for estimating of the cross-coalescence rate between pop1 and pop2.
  • 0-4,0-5,1-4,1-5 for estimating of the cross-coalescence rate between pop1 and pop3.
  • 2-4,2-5,3-4,3-5 for estimating of the cross-coalescence rate between pop2 and pop3.

stschiff avatar Mar 15 '23 16:03 stschiff