vg
vg copied to clipboard
Output VCF , variants informations (SNPs, Indels, SVs etc)
1. What were you trying to do?
Understand the output VCF of vg call
2. What did you want to happen? Analyse VCF file, first retrieve which kind of variant is present in each position from INFO field as ''usual'' VCF as in SVTYPE in this exemple
3. What actually happened? There's no such an info in the output VCF exemple of output :
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT T004
Chr01 223 >15770416>8 AT A 20.482 PASS AT=>15770416>15770417>8,>15770416>8;DP=28 GT:DP:AD:GL:GQ:GP:XD:MAD 0/1:28:24,4:-7.56791,-6.00864,-53.1783:22:-1.10412:19.5098:4
4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt
, please copy-paste the contents of that file here:
NONE
5. What data and command can the vg dev team use to make the problem happen?
i used usual vg commands pipeline construct > giraffe > augment > snarls-pack > call
6. What does running vg version
say?
vg version v1.40.0 "Suardi"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux
Analyse VCF file, first retrieve which kind of variant is present in each position from INFO field as ''usual'' VCF as in SVTYPE in this exemple I have the same quaetion. How to get the SVTYPE from the output vcf file of vg?
From the VCF spec:
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
Value should be one of DEL, INS, DUP, INV, CNV, BND. This key can be derived from the REF/ALT fields but is
useful for filtering
The two issues for vg are:
-
Number=1
means there can only be oneSVTYPE
per site. But vg graph can often contain many different tpyes of SVs at the same site - Sometimes it's difficult to assign an allele to one of the simple categories (ie events can be more complex)
But that said, I think you raise a fair point: we should at the very least provide scripts or suggestion of best practices for cleaning of the VCFs and categorizing the SV calls, as we end up doing this ourselves too when analyzing them.
Has there been any progress regarding best practices when it comes to populating the SVTYPE
and possibly the SVLEN
field in a VCF generated from a pangenome graph? It would be useful to be able to make comparisons to VCFs produced by sniffles, etc. Thanks!
Hi, @glennhickey. It is hard to understand the 'INFO' field from the vg call
ouput. Since users care more for the variant information like SV position, SV type, and SV id as the input vcf file for autoindex-giraffe-pack-call
workflow. So it is helpful to output the raw SV information for vg call
. I sincerely hope vg team optimize for this problem.
For anyone coming across this issue with the same problem, I have found the 'truvari' tool useful for populating the INFO field of pangenome-derived VCFs. Running 'truvari anno' (https://github.com/acenglish/truvari/wiki/anno) allows you to include the SVLEN and SVTYPE tag. However it can only accurately label straightforward insertions and deletions, everything else it tagged as 'UNK', so this isn't a perfect solution. It would be great to be able to compare the output with a VCF derived from a tool such as sniffles.
Hi, @evcurran. I solved the problem by the similar way. But the key problem is to compare the vcf generated by vg call
and the original input vcf, I find it's hard to compre this two vcf file since the variants coordinate are different.