vg icon indicating copy to clipboard operation
vg copied to clipboard

The number of variations in the pan-genome is reduced compared to the variations in the input VCF file

Open wk1352313 opened this issue 1 year ago • 2 comments

Do vg filter out some variants during the construction of the pan-genome, and if so, what are the criteria for filtering? The number of variations in the pan-genome is reduced compared to the variations in the input VCF file. The command I used is "vg autoindex --workflow giraffe -v sv.vcf -r ref.genome -p sv -t 64" The number of variants decreased by almost half after undergoing vg deconstruct compared to the number of variants in the VCF used to construct the pan-genome graph. What could be the reason for this?

wk1352313 avatar Nov 24 '23 16:11 wk1352313

The usual reason is that the VCF contains overlapping variants, and vg deconstruct combines them into a single variant. When the variants overlap, the don't exist separately in the graph.

If you are dealing with structural variants, vg construct may filter out some of them. See the wiki for further information.

On a bit more fundamental level, VCF is inadequate for storing anything beyond simple non-overlapping edits to the reference. The standard does not fully specify how overlapping variants should be interpreted, and different tools often have different subtly incompatible interpretations.

jltsiren avatar Nov 24 '23 22:11 jltsiren

Thank you! Your suggestions have inspired me.

The usual reason is that the VCF contains overlapping variants, and vg deconstruct combines them into a single variant. When the variants overlap, the don't exist separately in the graph.

If you are dealing with structural variants, vg construct may filter out some of them. See the wiki for further information.

On a bit more fundamental level, VCF is inadequate for storing anything beyond simple non-overlapping edits to the reference. The standard does not fully specify how overlapping variants should be interpreted, and different tools often have different subtly incompatible interpretations.

wk1352313 avatar Nov 29 '23 08:11 wk1352313