vcf2maf
vcf2maf copied to clipboard
vcf2vcf genotyping using bam input
One of the really nice features of vcf2vcf
is the genotyping feature. I don't know any other software that really has this capability. I've tried using freebayes
force-calling which is very fast but it doesn't work as expected (skips a handful of variants), and bcftools
/samtools
don't seem to offer this possibility without additional post-processing (as vcf2vcf
essentially does)
However, I've found vcf2vcf
to be too slow when genotyping a large number of variants. I wonder if this could be made more efficient by using samtools mpileup
on a bed file covering all of the features in the VCF file and post-processing the results, rather than calling samtools mpileup
separately for each variant in the VCF file?
Unrelated, but wondering why the DP
tag is chosen over DP4
to output read depth?
Thanks. I agree - vcf2vcf's genotyping feature should be sped up with a BED file. I use a similar strategy to speed up samtools faidx
to pull flanking bps. But that was easy since samtools faidx
can take many regions in command-line.
Speeding up vcf2vcf genotyping will need to remain in my backlog. It will be a while till I can get to it. I'll leave this issue open. In the meantime, look at GetBaseCountsMultiSample. It accepts either VCF or MAF as input, and produces a MAF-like output file.
In vcf2vcf output, DP
is for total depth. In other VCF specs, DP4
lists 4 values for fwd/rev read counts of REF/ALT alleles, but it does not work for multi-allelic ALTs. So mpileup uses ADF
and ADR
instead to represent fwd/rev read counts for all alleles.