msmc2
msmc2 copied to clipboard
using phased-vcf file as input for MSMC2
Hi I want to run the MSMC2 for my dataset which is phased vcf files (multi-sample vcf file with 26 samples) for each chromosome separately (i.e. Chr10.vcf.gz).
I did this process as follows, to use my vcf files as input for running the MSMC2:
First, use the bcftools to produce a separate vcf file for each sample (i.e. sample1.Chr10.vcf.gz). Second, use the vcfAllSiteParser.py to produce the .bed files. and then running generate_multihetsep.py to merge VCF and mask files together. *I didn’t do the phasing step, because I supposed that it should include my phasing dataset.
But I received an error in the last step when I ran msmc2 for Estimating the effective population size. I noticed that produced multihetsep.txt files (i.e. Chr10. multihetsep.txt) are too heavy also.
My question is, should I run the phasing step too?
I really appreciate your help in helping me identify the problem.
With the best Niloo
As discussed via email, I think the issue is that your phased VCF is not recognised as being phased. Phased genotypes require a notation like 0|1 or 1|0. If you have 0/1 instead, it is being treated as unphased, leading to combinatorially many combinations and breaking your resulting file in terms of size.
yes, we discussed and I also checked my phasing vcf files. the problem is from their format and I am working to solve it.
Many thanks for your help
Niloo
From: Stephan Schiffels @.***> Sent: 22 June 2023 13:36:26 To: stschiff/msmc2 Cc: Niloofar Alaei Kakhki; Author Subject: Re: [stschiff/msmc2] using phased-vcf file as input for MSMC2 (Issue #52)
As discussed via email, I think the issue is that your phased VCF is not recognised as being phased. Phased genotypes require a notation like 0|1 or 1|0. If you have 0/1 instead, it is being treated as unphased, leading to combinatorially many combinations and breaking your resulting file in terms of size.
— Reply to this email directly, view it on GitHubhttps://github.com/stschiff/msmc2/issues/52#issuecomment-1602484595, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANPOBGGLIV4G544GFT3XIOTXMQU3VANCNFSM6AAAAAAZH7BCBU. You are receiving this because you authored the thread.Message ID: @.***>