vg
vg copied to clipboard
Different versions of vg produce different GAM and VCF files when mapping with vg giraffe and calling variants
PLEASE DO NOT MAKE SUPPORT REQUESTS HERE
Please the Biostars forum instead:
https://www.biostars.org/new/post/?tag_val=vg Hi, i used the vg=1.35 and vg=1.38 to giraffe for same fastq file, but i got different .gam and .vcf. Such as vg=1.35 product .gam file 17G; .vcf.gz file 9M. But vg=1.38 product .gam file 18G; .vcf.gz file 12M. Why would different versions cause this?
We don't actually guarantee identical GAM or VCF output between minor releases. In each version, we fix bugs or make algorithm or parameter changes that could result in different output, especially for tools like Giraffe which rely heavily on heuristics and don't produce a single optimal "correct" answer.
To work this out in detail, you would want to look at the changelogs for the releases after 1.35, up to 1.38:
https://github.com/vgteam/vg/releases/tag/v1.36.0 https://github.com/vgteam/vg/releases/tag/v1.37.0 https://github.com/vgteam/vg/releases/tag/v1.38.0
For example, in 1.36 we changed Giraffe seeding, which we bill as increasing speed but maybe could also result in more/different seeds being picked, leading to different alignments?
- Giraffe no longer uses duplicate minimizers as often for seeds, potentially increasing mapping speed.
We also started adding more annotations to the Giraffe GAM output, which might make it larger:
- Giraffe records read and pair mapping wall clock times
If you're concerned that the new GAM files are not just different but might be worse, you can use vg stats -a whatever.gam
to get some statistics about the alignments, which you can compare.