Sniffles icon indicating copy to clipboard operation
Sniffles copied to clipboard

support needed: if ALT is base-level identical across samples from a muli-sample vcf generated by sniffles2?

Open maxineliu opened this issue 11 months ago • 6 comments

For the output from combined calling, sniffles2 will provide an actual sequence in the ALT column for nearly every SV, except for a few symblic ALT representations. Does this imply the SV exited in every sample is identical at base-level? If not, could you decribe the criterior used for this step -- how is a consensus ALT sequence generated?

I have searched in the articled pubilished in 2024, no clue found.

Thanks, Maxine

maxineliu avatar Mar 11 '24 21:03 maxineliu

so on a per sample bases you can get the ALT coloumn completelfy filled with providing the reference fasta.

You are probably asking for the merge (across different samples)? This case one representative sequence is chosen. In the new code version (short before release) we also cross check sequencing identity thus the sequence reported should represent all samples closely (identity above 85% I think.. would need to look it up again or @hermannromanek might know.

fritzsedlazeck avatar Mar 11 '24 21:03 fritzsedlazeck

Thank you for answering! Another question is when combining SNFs, An SV candidate matches a group if it has the same SV type and the sum of start position and length deviation is less than multiply M by the square root of the smaller value between SVLength and GoupSVLength.

That's what is mentioned in the article. What is the defination of start position. I mean if it is chromsomal coordinates, looks weired...

maxineliu avatar Mar 12 '24 00:03 maxineliu

Thank you for answering! Another question is when combining SNFs, An SV candidate matches a group if it has the same SV type and the sum of start position and length deviation is less than multiply M by the square root of the smaller value between SVLength and GoupSVLength.

That's what is mentioned in the article. What is the defination of start position. I mean if it is chromsomal coordinates, looks weired...

Does the start position refer to a relative distance to the group average start position? It should be negtive if the sv start before the average group start?

maxineliu avatar Mar 12 '24 16:03 maxineliu

So start is the left most and stop is the right most position/breakpoint. The positions are defined by the read alignments (ie. based on the split or alignment signal) there is a filtered distribution that we eliminate the outliers of the signals and then take the median position from the alignments. Hope that helps Fritz

fritzsedlazeck avatar Mar 12 '24 21:03 fritzsedlazeck

I'm still not quite clear. Perhaps you misunderstood my question?

The formula that I'm struggling to understand is: math

It belongs to the section on how to cluster SV candidates across multiple SNFs. In my understanding, every SV candidate in SNFs already has a definite start position. This position is determined in the way you explained.

This start position should be something like chr1 2034 (the base at position 2034 on chromosome 1). But if that's the case, then on the left side of this equation—the sum of the start position and the length deviation, the value for SV at the tail of a chromosome would be larger than that from beginning, which seems illogical.

maxineliu avatar Mar 13 '24 05:03 maxineliu

Oh I see my reply was on a per read/sample level. You are interested in the population/ across sample merge. So again the start is always the left most and stop is the right most position. So the start will always be < than the stop. Each position is redefined independently from each other. What basically happens is that we have a minimum wobble/tollorance of 500bp across samples. The 500bp are scaled to avoid over or under merging when the SV size (SV length) gets smaller. So the square root is just to account for the SV sizes within a cluster. The mechanism starts with one SV from one sample as a representative and then gets added new SV to it.

We have now btw. changed this :) . We are following much more closely what Truvari collapse is doing as we observed some overmerging for smaller SV..

fritzsedlazeck avatar Mar 13 '24 11:03 fritzsedlazeck

I am closing this. Hope its answered.

fritzsedlazeck avatar Apr 09 '24 13:04 fritzsedlazeck