bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

bcftools mpileup v1.15 (wrongly) does not call SNPs when bcftools mpileup v1.10.2 does

Open alnam3 opened this issue 2 years ago • 7 comments

Hi,

I am using bcftools consensus using v1.15 and got SNPs which were weirdly not called, so I went back to the vcf files. I have SNPs which I know exist (I am working on simulated reads) which are called when using mpileup from version 1.10.2 and not when using version 1.15, even if a lot of alternative sequences are found (see below). DP4 changes too. I use the same reference file and the same bam file. I used the options -q 10 -d 15000 in both cases.

For instance: with v1.10.2 gene1 51 . A T 22.268 . DP=17932;VDB=0;SGB=-0.693147;RPB=1.51927e-05;MQB=0;BQB=0.0361916;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=4774,0,3504,0;MQ=49 GT:PL 0/1:55,0,70

with v1.15 gene1 51 . A . 17.2269 . DP=17932;VDB=0;SGB=-0.693147;RPBZ=-6.58337;MQBZ=-88.0103;BQBZ=-2.09656;SCBZ=5.45755;FS=0;MQ0F=0;AN=2;DP4=10424,0,7508,0;MQ=49 GT 0/0

I read manual pages for both versions and cannot find a default parameter which changed. Thanks a lot!

alnam3 avatar May 18 '22 14:05 alnam3

There were a lot of changes to mpileup. See the release notes in https://github.com/samtools/bcftools/releases/tag/1.13.

You may be able to just add --config 1.12 when using 1.15 to get the old behaviour, but note while there will be specific calls that the old version correctly made and the new one does not, overall the changes are a significant improvement in SNP precision vs recall tradeoffs. So I'd only advise using the old parameters if you absolutely need compatibility with old versions.

jkbonfield avatar May 23 '22 08:05 jkbonfield

It would be interesting to see what the missed SNV looked like. It would be helpful to see an IGV screenshot and the raw mpileup output for that site.

pd3 avatar May 24 '22 15:05 pd3

Thanks for your answers, I will go and look at the release notes more in depth! @pd3 I'll come back with an IGV screenshot soon. I tested the latest release on simulated nanopore reads, for which I know the mutation, and played a bit with the options. There are much fewer missed SNPs when I increase the -P, but there are still some, even with (unrealistically) high values for P, and in some positions, SNPs aren't called even if there is a really high coverage for the alternative base, so there seems to be an issue.

For instance, using mpileup -q 10 -d 75000 -a AD --config ont and call -mA, I get a problem at the following position

gene1        201     .       A       T,C,G   2.08851 .       DP=19787;VDB=0;SGB=-0.693147;RPBZ=-2.78045;MQBZ=0;BQBZ=0.28492;SCBZ=0.114625;FS=0;MQ0F=0;AC=0,0,0;AN=2;DP4=10934,0,8819,0;MQ=60 GT:PL:AD
        0/0:0,8,29,255,255,74,255,255,78,74:10934,8618,108,93

where the SNP is not called even if 8000 reads support it. I thought it might be for matters of ploidy, but there are position with similar distributions of reference and alternative reads which are properly called, e.g. in the same call:

gene1        192     .       T       A,C,G   37.9403 .       DP=19848;VDB=0;SGB=-0.693147;RPBZ=-28.0788;MQBZ=0;BQBZ=0.55589;SCBZ=0.184087;FS=0;MQ0F=0;AC=1,0,0;AN=2;DP4=10754,0,9060,0;MQ=60 GT:PL:AD
        0/1:41,0,50,255,255,108,255,255,112,108:10754,8870,101,89

Note: I have all the alternative bases called, which is quite unrealistic, but there are a lot of errors in nanopore sequences so I tried to take that into account in my sequence simulation. Thanks again.

alnam3 avatar May 24 '22 16:05 alnam3

This would require also the BAM and the reference for testing. Showing just the result again is not helpful.

pd3 avatar May 31 '22 09:05 pd3

Sorry for the delay, I had a cluster issue and lost all my intermediate files, I will provide the files asap.

alnam3 avatar Jun 09 '22 16:06 alnam3