bamUtil icon indicating copy to clipboard operation
bamUtil copied to clipboard

Strelka output using Clipped bams not the same as unclipped bams

Open ghost opened this issue 9 years ago • 15 comments

Hello,

I noticed that output of Strelka for detecting variants is different when using clipped bams(using ClipOverlap) with using unmodified bams. This is an important issue because I don't know how I can trust the trimmed bams. An specific example is a deletion that is being supported by atleast 4 unpaired reads and this deletion has not been detected with Strelka using the Clipped bams. Do you know what can be the problem here?

ghost avatar Jan 13 '16 20:01 ghost

Do you think that clipOverlap clipped some reads that should not have been clipped? Can you find any reads that overlap that position that were clipped?

Do you see whether that deletion was called by Strelka but then Strelka filtered it out?

(Disclaimer: These are speculations, and I don't know what I'm talking about.)

pjvandehaar avatar Jan 13 '16 21:01 pjvandehaar

screen shot 2016-01-13 at 1 41 46 pm

This screenshot shows some of the reads supporting this deletion. Two of the read pairs have a small overlap but not at the region of interest. Why Strelka filter out the deletion for one set of bam files but not the others?

ghost avatar Jan 13 '16 21:01 ghost

Can you look at those reads before and after clip overlap?

What do the flags and the cigars look like? Any change in them?

How about their mates? Did they change? On Jan 13, 2016 4:44 PM, "artonmarton" [email protected] wrote:

[image: screen shot 2016-01-13 at 1 41 46 pm] https://cloud.githubusercontent.com/assets/12701400/12308702/83664f8e-b9fb-11e5-9013-3cd55eb08639.png

This screenshot shows some of the reads supporting this deletion. Two of the read pairs have a small overlap but not at the region of interest. Why Strelka filter out the deletion for one set of bam files but not the others?

— Reply to this email directly or view it on GitHub https://github.com/statgen/bamUtil/issues/20#issuecomment-171445362.

mktrost avatar Jan 13 '16 21:01 mktrost

Sorry for delay, I am trying to figure out what is the differences, Where can I check the flags?

ghost avatar Jan 13 '16 23:01 ghost

There are three reads that are different, two of the reads have lower phred score in the clipped bams and the third read has a CIGAR string as attached in this image. screen shot 2016-01-13 at 3 22 51 pm

ghost avatar Jan 13 '16 23:01 ghost

Unpaired reads should pass through ClipOverlap unchanged, so those reads should be identical with or without clipOverlap (unless you used the --overlapsOnly option, but I doubt you did that.

A CIGAR change is expected for an overlapping pair.

I'm intrigued by the phred quality change. ClipOverlap should not modify any quality scores. So I find that interesting. If phreds are recaulculated, it is possible the algorithm will come up with different phreds even on reads without overlaps.

Did you run any tools between ClipOverlap and Strelka?

I don't know how Strelka works, but lower phred scores could affect what it detects. So it may be worthwhile to figure out how/why they are changing.

On Wed, Jan 13, 2016 at 6:23 PM, artonmarton [email protected] wrote:

[image: screen shot 2016-01-13 at 3 23 23 pm] https://cloud.githubusercontent.com/assets/12701400/12311088/9d61a678-ba09-11e5-98af-a1ea52c5fed2.png This is one of the reads with lower phred score.

— Reply to this email directly or view it on GitHub https://github.com/statgen/bamUtil/issues/20#issuecomment-171469997.

mktrost avatar Jan 14 '16 03:01 mktrost

No I didn't run anything else between ClipOverlap and Strelka. I only made index files for the bam files after ClipOverlap.

ghost avatar Jan 14 '16 03:01 ghost

In the code you ran in issue https://github.com/statgen/bamUtil/issues/16, were some quality scores changed? Could that be the cause? Are your unmodified bams from before that code, or after it?

(Again, disregard this comment if it's incoherent.)

pjvandehaar avatar Jan 14 '16 03:01 pjvandehaar

Please ignore Phred score because that is not the issue and that was my mistake that I did not selected the right basepair. I am back to the first square. :)

ghost avatar Jan 14 '16 04:01 ghost

The reason behind this issue is still unknown if someone has any idea what can cause this inconsistency in the results of Strelka using clipped and unclipped bams.

ghost avatar Jan 16 '16 03:01 ghost

Back to mktrost's question from earlier: When you look at the reads supporting the missing variant (and maybe their mates)– are there any differences (other than the CIGAR changes that are expected) between the reads before and after clipOverlap?

Is your comment from earlier still correct, or are there other reads that should be looked into?

pjvandehaar avatar Jan 16 '16 04:01 pjvandehaar

No the comments are correct. There is no difference but CIGAR for some of the reads.

ghost avatar Jan 16 '16 04:01 ghost

I don't know how Strelka determines its results. It does seem odd that it would lose the deletion if the only difference for that position is that 80 bases away from it a read is softclipped. If the deletion was in a clipped position that would be an obvious explanation, but you said that wasn't the case.

Maybe Strelka takes into account read length and because the matching region is shorter it thinks it is more likely to be mismapped.

Do you know anything about how Strelka works or determines it's results? That might help figure it out.

I had asked previously about flag differences in case clipOverlap made a read unmapped due to orientation clipping. Because that could be a reason for losing evidence for the deletion. But it sounds like you checked and that isn't the case.

Let me know if I'm misunderstanding your question. Fyi there is a bamUtil diff tool (with --all option) that may be helpful for comparing bams if you need it. Unfortunately it doesn't have an option to limit the diff to a specific region. I will try to add that in the future.

mktrost avatar Jan 16 '16 05:01 mktrost

As much as I know about Strelka, it doesn't remove reads with different length. Every read has a different length so that is not an issue.

About the question you asked about flags, I couldn't find a way to check the flags. Should I look at the bam files?

I have extracted the reads with all the informations in this region in two files. Do you think the bamUtil diff tool would work on these files?

ghost avatar Jan 16 '16 05:01 ghost

I tried something new today, I run Strelka using unclipped-bam for normal and clipped bam for tumor and vice-versa. Surprisingly when ever I used un-clipped normal bam I got the correct results. So I am guessing maybe there is a problem with the clipped normal bam?

On Fri, Jan 15, 2016 at 9:32 PM, mktrost [email protected] wrote:

I don't know how Strelka determines its results. It does seem odd that it would lose the deletion if the only difference for that position is that 80 bases away from it a read is softclipped. If the deletion was in a clipped position that would be an obvious explanation, but you said that wasn't the case.

Maybe Strelka takes into account read length and because the matching region is shorter it thinks it is more likely to be mismapped.

Do you know anything about how Strelka works or determines it's results? That might help figure it out.

I had asked previously about flag differences in case clipOverlap made a read unmapped due to orientation clipping. Because that could be a reason for losing evidence for the deletion. But it sounds like you checked and that isn't the case.

Let me know if I'm misunderstanding your question. Fyi there is a bamUtil diff tool (with --all option) that may be helpful for comparing bams if you need it. Unfortunately it doesn't have an option to limit the diff to a specific region. I will try to add that in the future.

— Reply to this email directly or view it on GitHub https://github.com/statgen/bamUtil/issues/20#issuecomment-172160089.

ghost avatar Jan 18 '16 19:01 ghost