vg
vg copied to clipboard
The number of variations in the pan-genome is reduced compared to the variations in the input VCF file
Do vg filter out some variants during the construction of the pan-genome, and if so, what are the criteria for filtering? The number of variations in the pan-genome is reduced compared to the variations in the input VCF file. The command I used is "vg autoindex --workflow giraffe -v sv.vcf -r ref.genome -p sv -t 64" The number of variants decreased by almost half after undergoing vg deconstruct compared to the number of variants in the VCF used to construct the pan-genome graph. What could be the reason for this?
The usual reason is that the VCF contains overlapping variants, and vg deconstruct
combines them into a single variant. When the variants overlap, the don't exist separately in the graph.
If you are dealing with structural variants, vg construct
may filter out some of them. See the wiki for further information.
On a bit more fundamental level, VCF is inadequate for storing anything beyond simple non-overlapping edits to the reference. The standard does not fully specify how overlapping variants should be interpreted, and different tools often have different subtly incompatible interpretations.
Thank you! Your suggestions have inspired me.
The usual reason is that the VCF contains overlapping variants, and
vg deconstruct
combines them into a single variant. When the variants overlap, the don't exist separately in the graph.If you are dealing with structural variants,
vg construct
may filter out some of them. See the wiki for further information.On a bit more fundamental level, VCF is inadequate for storing anything beyond simple non-overlapping edits to the reference. The standard does not fully specify how overlapping variants should be interpreted, and different tools often have different subtly incompatible interpretations.