bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

Feature request: inverse of norm --atomize to join MNPs

Open Andy-B-123 opened this issue 1 year ago • 2 comments

Hi, I am using bcftools version 1.15.1 and am having trouble getting MNP calls to happen. I have a case here where there are 2 adjacent SNP variants which I believe should be called as an MNP: image

However I can't seem to be able to get bcftools to call these as a single variant? I've looked through and see references to MNPs being supported in the manual and in other issues but I haven't been able to replicate this. It's a problem for my as in this instance the 2 single variants get annotated incorrectly with SnpEff (it calls it a stop-gain when it's actually a mis-sense if assessing the 2 calls together).

Would you have any thoughts on how I can get MNP calls working? Or am I mis-understanding the MNP call?

Commandline for run, tried default (below) and consensus caller (same result): bcftools mpileup -Ou -a AD,DP,SP -f $reference $sample.bam | bcftools call -a GQ,GP --variants-only -m -O z -o sample.bcftools.default.vcf.gz

VCF line of 2 individual variants:

bwa_aligned.filt.counts_GATC.32g14      6954184 .       C       A       225.417 .       DP=21;VDB=0.217395;SGB=-0.688148;MQSBZ=0.117851;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,7,8;MQ=38 GT:PL:DP:SP:AD:GP:GQ    1/1:255,45,0:15:0:0,15:0,0,1:127
bwa_aligned.filt.counts_GATC.32g14      6954185 .       A       T       225.417 .       DP=21;VDB=0.211126;SGB=-0.688148;MQSBZ=0.117851;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,7,8;MQ=38 GT:PL:DP:SP:AD:GP:GQ    1/1:255,45,0:15:0:0,15:0,0,1:127

Included a (hopefully) mini-reproducible example in the below zip. Contains the alignments for that region and the contig from the reference file. Ran with below to generate vcf:

bcftools mpileup -Ou -a AD,DP,SP -f RefSingle.fasta GRx17-GP_2.sort.MNPRegion.sort.bam | bcftools call -a GQ,GP --variants-only -m -O z -o test.example.vcf.gz temp_AssessMNP.zip

Andy-B-123 avatar Jan 04 '23 23:01 Andy-B-123

The program calls SNVs independently and reports MNPs in separate records, so this is not a bug. Unfortunately, we do not have a tool in bcftools yet that would join MNPs into a single record. I'll rename the issue as feature request.

Just to cross reference, the same has been planned and requested also by https://github.com/samtools/bcftools/issues/1724

pd3 avatar Jan 13 '23 10:01 pd3

Thank you for the reply! Yes, essentially it would be the reverse of expanding MNPs.

How does BCFtools flag or report MNPs as separate records? I can't seem to find a parameter in the calls that would flag that two SNPs should be treated as an MNP? GatKs 'Genotype gVCFs' also outputs separate variants for MNPs but they have the "PGT" and "PID" flags which helps identify when variants should be linked, is there an option for this with bcftools call?

Quick edit: I just looked at it seems like "PS" and "HP" seem like the more common flags, at least according to the WhatsHap documentation? https://whatshap.readthedocs.io/en/latest/guide.html#phasing-in-vcfs

Andy-B-123 avatar Jan 16 '23 01:01 Andy-B-123