SVclone VAF calculation details

I am looking at the SV VAF output. Do you have a more detailed description on how "vaf1" and "vaf2" were derived?

Feb 20 '25 02:02 lwtan90

The fields are calculated as the number of reads supporting the breakpoint divided by the supporting plus normal (unaffected) reads for each locus (left side = vaf1 and right side = vaf2). For more details please have a look at page 35 of my thesis.

Feb 20 '25 23:02 mcmero

Thank you! How is "normal" reads decided? The reason was when we compared with GRIDSS2 and Manta, the "adjusted" and "unadjusted" VAF is higher. Wondering if this is because "normal" read is under-counted?

Btw, I can't seem to access your thesis. :)

Feb 24 '25 17:02 lwtan90

Normal reads have to pass quite a few checks in SVclone, which leans more on the conservative side for read counting. So I'm not surprised that the counts are lower vs. other other methods.

Normal reads have to pass these checks:

Insert distance is not anomalous (less than 3 * standard deviation of average insert size and greater than 2 * read length)
Read crosses the break boundary by at least norm_overlap
Neither read is soft-clipped by 2bp or more

My thesis is open-access and freely available as a PDF, here is a direct link.

Feb 25 '25 00:02 mcmero

Thank you! Just to clarify, read crosses the break boundary by at least 2bp: are you considering this as a paired-end read / fragment (R1 and R2), which means as long as the pairs span the boundary? or as long as one of the reads with no soft-clipping physically crosses the breakpoint (which means the breakpoint must fall within the read) ?

Feb 26 '25 01:02 lwtan90

Yes, it will also count reads spanning the boundary.

Feb 26 '25 02:02 mcmero

Thank you! I just saw in the config file there is a parameter: norm_overlap: 10 sc_len: 10

Based on your experience, what is an optimal threshold?

Feb 26 '25 02:02 lwtan90

Go with the defaults unless you have a specific reason to change them.

Feb 26 '25 05:02 mcmero

Thank you for the prompt response! Appreciate it. My last question would be whether SVClone will check if the soft-clipped sequences match the loci of its partner's breakpoint sequences because including the read in the calculation?

Feb 26 '25 05:02 lwtan90

No, SVclone doesn't check this. This would be a good additional check but was never implemented. We assume sufficiently soft-clipped reads at the SV location support the break, but this may not always be the case.

Feb 26 '25 22:02 mcmero

Can I clarify how is Inversion VAF calculated? (I have read the thesis, but I could not find anything satisfying)

My question is in terms of how are support reads considered / counted? I saw on the dataset I am dealing with to have only 1 spanning read and for an inversion event, although I am seeing a lot more supporting reads on IGV.

Additional, Manta and GRIDSS2 recorded higher VAFs (0.12), where unadjusted SVClone's VAF was only 0.02.

Mar 21 '25 20:03 lwtan90

Supporting reads for inversions are not counted any differently to other SVs. One noteworthy aspect of inversions however is that they create two breakpoints in opposite directions, so make sure that your directionality is correct in the SVclone input.

Mar 24 '25 04:03 mcmero

Thank you!

I rerun SVClone without directional input from the SV caller (manta in my case). The VAF was higher using SVClone inferred directionality. When I looked at the orientation, they were assigned either +/+ or -/-. I was under the assumption that Inversion event as a whole should be +/-?

Should I break each inversion event into two breakpoints prior as input for SVClone?

Mar 25 '25 23:03 lwtan90

Each break-end pair that compose an inversion will be +/+ and -/-. Have a look at Supplementary Figure 8 of the paper.

Mar 26 '25 03:03 mcmero