VAF calculation details
I am looking at the SV VAF output. Do you have a more detailed description on how "vaf1" and "vaf2" were derived?
The fields are calculated as the number of reads supporting the breakpoint divided by the supporting plus normal (unaffected) reads for each locus (left side = vaf1 and right side = vaf2). For more details please have a look at page 35 of my thesis.
Thank you! How is "normal" reads decided? The reason was when we compared with GRIDSS2 and Manta, the "adjusted" and "unadjusted" VAF is higher. Wondering if this is because "normal" read is under-counted?
Btw, I can't seem to access your thesis. :)
Normal reads have to pass quite a few checks in SVclone, which leans more on the conservative side for read counting. So I'm not surprised that the counts are lower vs. other other methods.
Normal reads have to pass these checks:
- Insert distance is not anomalous (less than 3 * standard deviation of average insert size and greater than 2 * read length)
- Read crosses the break boundary by at least
norm_overlap - Neither read is soft-clipped by 2bp or more
My thesis is open-access and freely available as a PDF, here is a direct link.
Thank you! Just to clarify, read crosses the break boundary by at least 2bp: are you considering this as a paired-end read / fragment (R1 and R2), which means as long as the pairs span the boundary? or as long as one of the reads with no soft-clipping physically crosses the breakpoint (which means the breakpoint must fall within the read) ?
Yes, it will also count reads spanning the boundary.
Thank you! I just saw in the config file there is a parameter: norm_overlap: 10 sc_len: 10
Based on your experience, what is an optimal threshold?
Go with the defaults unless you have a specific reason to change them.
Thank you for the prompt response! Appreciate it. My last question would be whether SVClone will check if the soft-clipped sequences match the loci of its partner's breakpoint sequences because including the read in the calculation?
No, SVclone doesn't check this. This would be a good additional check but was never implemented. We assume sufficiently soft-clipped reads at the SV location support the break, but this may not always be the case.
Can I clarify how is Inversion VAF calculated? (I have read the thesis, but I could not find anything satisfying)
My question is in terms of how are support reads considered / counted? I saw on the dataset I am dealing with to have only 1 spanning read and for an inversion event, although I am seeing a lot more supporting reads on IGV.
Additional, Manta and GRIDSS2 recorded higher VAFs (0.12), where unadjusted SVClone's VAF was only 0.02.
Supporting reads for inversions are not counted any differently to other SVs. One noteworthy aspect of inversions however is that they create two breakpoints in opposite directions, so make sure that your directionality is correct in the SVclone input.
Thank you!
I rerun SVClone without directional input from the SV caller (manta in my case). The VAF was higher using SVClone inferred directionality. When I looked at the orientation, they were assigned either +/+ or -/-. I was under the assumption that Inversion event as a whole should be +/-?
Should I break each inversion event into two breakpoints prior as input for SVClone?
Each break-end pair that compose an inversion will be +/+ and -/-. Have a look at Supplementary Figure 8 of the paper.