Sniffles icon indicating copy to clipboard operation
Sniffles copied to clipboard

INV, DEL, DUP with huge SVLENs in sniffles 2.0.7

Open ddrichel opened this issue 1 year ago • 7 comments

Re-opening the previously reported issue https://github.com/fritzsedlazeck/Sniffles/issues/235, as it persists in sniffles 2.0.7 and we are still looking for a reasonable solution to filter FP calls without losing too many TPs. According to sniffles 2.0.7, our largest SVs in a dataset unrelated to the previous report, by SV type: A 149Mb DUP on chr4, a 131Mb INV on chr1, and a 64.5Mb DEL on chr3.

Kind regards

Dmitriy

ddrichel avatar Mar 27 '23 17:03 ddrichel

Thanks Dmitriy, yes, this can happen. sometimes these are real events and sometimes they just indicate a transposon jump. Worst case they are alignment errors. We are working on a better annotation for these events. Maybe we should just refer to them as BND? The issue here is really that there is no signal differentiating between a inter chromosomal BND vs. a large e.g. INV.

Thanks Fritz

fritzsedlazeck avatar Jun 15 '23 15:06 fritzsedlazeck

Hello Fritz, In the example I worked out here: https://github.com/fritzsedlazeck/Sniffles/issues/235 it is a clear case of Sniffles' internal logic getting confused about start and end positions of the SV in the case that an intrachromsomal transposition is inverted. In cases like this, there is no reason to assume that this is not a real event - alignment errors are always a prossible problem, independent of this specific issue. Here, Sniffles combines one coordinate from the position where the transposon ends up, and one from the position where it originates from - due to the additional inversion, which is not accounted for in the calculation. Catching transpositions+inversions would eliminate at least some of these artifacts - maybe all? If not, one could follow up the remaining ones on case-by-case basis, as I did with this one. Regarding the SV type - transposition+inversion does not really fit into any other SV category used by Sniffles, other than BND, so I guess that would be reasonable enough, although of course a more fine-grained annotation would be ideal.

Kind regards

Dmitriy

ddrichel avatar Jun 17 '23 13:06 ddrichel

Same problem here. Did you find a solution to filter these events @ddrichel ( @fritzsedlazeck )? Otherwise, it isn't easy to use this tool, especially going to the multisample level. I have SV that spans everywhere.

leone93 avatar Jan 16 '24 07:01 leone93

@leone93 unfortunately I have no solution except for a hard SVLEN cutoff

ddrichel avatar Jan 16 '24 08:01 ddrichel

That was honestly my only idea to solve this problem. Which SVLEN did you use? Thanks, Dmitriy

leone93 avatar Jan 16 '24 08:01 leone93

@leone93 depends on whether precision or recall are more important for your purposes. In a recent preprint we chose a very anticonservative threshold of 50Mb (I believe), see https://www.medrxiv.org/content/10.1101/2023.12.20.23300308v1

ddrichel avatar Jan 16 '24 08:01 ddrichel

I will give it a look. Thank you! (however I was thinking to choose a conservative threshold)

leone93 avatar Jan 16 '24 08:01 leone93