truvari
truvari copied to clipboard
bench SVs without sequences
Hi,
I am wondering how the size (--pctsize 0.3) of SVs are calculated when SVs lack sequences, e.g., the ALT column is <INS> in a VCF file.
Best regards,
For example,
In comp vcf: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Yao chr1 10628 Sniffles2.INS.1 N INS:97 . . AC=2;AN=2 GT:HDS 1|1:0.998,0.736 chr1 67911 Sniffles2.DEL.1 N DEL:-426 . . AC=1;AN=2 GT:HDS 0|1:0.96,0.011 ......
In base vcf: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT YAO chr1 90258 PAV-INS-59 A AGTCCCTCTGTCTCTGCCAACCAGTTAACCCCCCCTGCTGCTTTCCCTCT . . . GT .|1 chr1 714324 PAV-DEL-50 GTAGAAGAATATGAGACATTTCCCTAATCCCCCATTATGTGTAATTACAAT G . . . GT 1|1 .....
I got the results like this: { "TP-base": 0, "TP-comp": 0, "FP": 0, "FN": 21353, "precision": null, "recall": null, "f1": null, "base cnt": 21353, "comp cnt": 0, "TP-comp_TP-gt": 0, "TP-comp_FP-gt": 0, "TP-base_TP-gt": 0, "TP-base_FP-gt": 0, "gt_concordance": 0, "gt_matrix": {} }
with the params: -C 1000 -O 0.0 -p 0.0 -P 0.3 -s 50 -S 15 --sizemax 100000
The logic for determining size is based on the vcf format specifications and documented here.
My first guess would be that your comp VCF is missing INFO/SVLEN and INFO/END, which are required for unresolved (e.g. <DEL> in the ALT column) SVs.
Thanks! It works after I added the INFO/SVLEN.
Best regards, Weiyang