htslib faster variant calling by threading BAQ calculation?

faster variant calling by threading BAQ calculation?

Open dkj opened this issue 5 years ago • 2 comments

If the BAQ calculation is a significant compute fraction of a variant calling pipeline can we get a faster turnaround by threading that BAQ calculation? (be it in samtools calmd, samtools mpileup, or bcftools mpileup

Aug 02 '18 09:08 dkj

It dawned on me we can do a poor mans threading here using calmd | mpileup. Eg:

Orig

$ time bcftools mpileup --threads 2 --fasta-ref $HREF -Ou ~/scratch/data/9827_2#49.1m.bam > /dev/null
[mpileup] 1 samples in 1 input files

real	0m18.121s
user	0m17.889s
sys	0m0.180s

$ time bcftools mpileup --threads 2 --fasta-ref $HREF -Ou -B ~/scratch/data/9827_2#49.1m.bam > /dev/null
[mpileup] 1 samples in 1 input files

real	0m13.964s
user	0m13.717s
sys	0m0.204s

Two threads here is simply to ensure asynchronous I/O and decoding of BAM, but I doubt thats's the bottleneck in bcftools anyway. This also shows (albeit on a tiny sample) the BAQ is about 1/3rd the cpu cost so that's how much we'd gain at most with threading this part.

Explicit calmd

$ time samtools calmd -@2 -u -E -r ~/scratch/data/9827_2#49.1m.bam $HREF | bcftools mpileup --threads 2 --fasta-ref $HREF -Ou - > /dev/null
[mpileup] 1 samples in 1 input files

real	0m15.031s
user	0m21.517s
sys	0m0.400s

It's not quite as fast, but calmd was running at about 50% CPU and bcftools at 100% so it's managing to remove most of the BAQ overhead into a second process. I checked the md5sum (when in VCF mode) before and after.

Aug 02 '18 10:08 jkbonfield

Thanks to an office discussion (with @dkj) this brings ideas for the multi-sample bcftools too.

Compare (fake 10x 1 sample, but go with it...):

time bcftools mpileup --threads 2 --fasta-ref $HREF ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam ~/scratch/data/9827_2#49.1m.bam | bcftools call -vmO v -o _1.vcf
Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
[mpileup] 1 samples in 10 input files

real	1m23.977s
user	1m34.834s
sys	0m0.772s

time bcftools mpileup --threads 2 --fasta-ref $HREF -Ou <(samtools calmd -E -@2 -r -A -u ~/scratch/data/9827_2#49.1m.bam $HREF)  <(samtools calmd -E -@2 -r  -u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF) <(samtools calmd -E -@2 -r --u ~/scratch/data/9827_2#49.1m.bam $HREF)| bcftools call -vmO v -o _3.vcf
Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
[mpileup] 1 samples in 10 input files

real	0m28.917s
user	0m28.682s
sys	0m0.472s

The huge difference now makes me realise my previous test was invalid to some extent as it was shallow data. On deep data BAQ becomes dominant as it's per read, but the mpileup speed scales sub-linearly with read depth (it has a linear component plus a fixed portion, which is dominant at low depths).

So there is potential merit here still; even if it's just to avoid horrendous shell hackery!

Aug 02 '18 10:08 jkbonfield

htslib htslib copied to clipboard

faster variant calling by threading BAQ calculation?

Orig

Explicit calmd

htslib
htslib copied to clipboard