svtyper icon indicating copy to clipboard operation
svtyper copied to clipboard

Dropping reference support in certain cases

Open mlinderm opened this issue 8 years ago • 2 comments

I noticed that svtyper will force reference evidence, e.g. RP, to 0 if there is no corresponding alternate evidence for that kind but alternate evidence for the other.

As an example, I was annotating an variant with the following evidence:

sr_a (ref, alt) 25 4 pe_a (ref, alt) 39 7 sr_b (ref, alt) 15 4 pe_b (ref, alt) 31 7 sr_a_scaled (ref, alt) 24.571404 3.999996 pe_a_scaled (ref, alt) 37.691501499 0.184318874161 sr_b_scaled (ref, alt) 14.999985 3.999996 pe_b_scaled (ref, alt) 30.6593866175 0.184318874161

Because pe_*_scaled (alt) rounds to 0 but there is split read alt evidence, the ref PE evidence was not reported.

Is there a reason not to report reference evidence if the alternate evidence is zero?

mlinderm avatar Mar 29 '16 21:03 mlinderm

Yeah, the rationale behind this behavior is that we know that either PE or SR evidence may be blind to capturing the alternate allele at certain loci. This may be due to repetitive sequences, assembly artifacts, or complex breakpoints that the aligner cannot map split-reads across.

So we made a decision to only consider the evidence type (PE or SR) when there is empirical data to show that we are able to capture the alternate allele. In practice, this is where was observe at least one non-reference alignment for that evidence type.

cc2qe avatar Mar 29 '16 21:03 cc2qe

Thanks @cc2qe. I understand the reasoning, but the results can then be very non-intuitive. I have been simulating data incorporating various deletions and see a bimodal distribution for RP depending on whether there is some alternate evidence or no alternate evidence. Consider the following two replicates for the same deletion (which is actually absent from the data, i.e. a 0/0 genotype)

gt svtyper.RP svtyper.RS svtyper.AP svtyper.AS 0/0 80 52 0 0
0/0 75 0 2 0

In the second replicate, because AP > 0 && AS == 0, RP is reported and RS is forced to zero. Where is in the first replicate since both AP and AS are reported as zero, both RP and RS are reported. I would advocate for either reporting the actual value always or if it is being forced to zero, to report the result as '.' to indicate it is not actually zero.

mlinderm avatar Mar 29 '16 21:03 mlinderm