truvari icon indicating copy to clipboard operation
truvari copied to clipboard

bench SVs without sequences

Open WeiYang-BAI opened this issue 1 year ago • 2 comments

Hi,

I am wondering how the size (--pctsize 0.3) of SVs are calculated when SVs lack sequences, e.g., the ALT column is <INS> in a VCF file.

Best regards,

WeiYang-BAI avatar Oct 23 '24 05:10 WeiYang-BAI

For example,

In comp vcf: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Yao chr1 10628 Sniffles2.INS.1 N INS:97 . . AC=2;AN=2 GT:HDS 1|1:0.998,0.736 chr1 67911 Sniffles2.DEL.1 N DEL:-426 . . AC=1;AN=2 GT:HDS 0|1:0.96,0.011 ......

In base vcf: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT YAO chr1 90258 PAV-INS-59 A AGTCCCTCTGTCTCTGCCAACCAGTTAACCCCCCCTGCTGCTTTCCCTCT . . . GT .|1 chr1 714324 PAV-DEL-50 GTAGAAGAATATGAGACATTTCCCTAATCCCCCATTATGTGTAATTACAAT G . . . GT 1|1 .....

I got the results like this: { "TP-base": 0, "TP-comp": 0, "FP": 0, "FN": 21353, "precision": null, "recall": null, "f1": null, "base cnt": 21353, "comp cnt": 0, "TP-comp_TP-gt": 0, "TP-comp_FP-gt": 0, "TP-base_TP-gt": 0, "TP-base_FP-gt": 0, "gt_concordance": 0, "gt_matrix": {} }

with the params: -C 1000 -O 0.0 -p 0.0 -P 0.3 -s 50 -S 15 --sizemax 100000

WeiYang-BAI avatar Oct 23 '24 08:10 WeiYang-BAI

The logic for determining size is based on the vcf format specifications and documented here.

My first guess would be that your comp VCF is missing INFO/SVLEN and INFO/END, which are required for unresolved (e.g. <DEL> in the ALT column) SVs.

ACEnglish avatar Oct 23 '24 15:10 ACEnglish

Thanks! It works after I added the INFO/SVLEN.

Best regards, Weiyang

WeiYang-BAI avatar Oct 24 '24 06:10 WeiYang-BAI