UMICollapse icon indicating copy to clipboard operation
UMICollapse copied to clipboard

Read duplication ?

Open karlkashofer opened this issue 2 years ago • 1 comments

I use your code in a exome pipeline which includes manta. Manta is super picky about the input bam files and it chokes on one of our current bams with a "Unexpected alignment name collision".

It seems to be caused by a read which is duplicated in the bam file:

`devarea@sx253:~/karl/PathoCromwell/cromwell-executions/Agilent_Exome_Single/43dccd69-b489-4c7b-b92b-bd9d00490356/call-AgilentDedup/execution$ samtools view D2758-8_Trimmed_Mapped_Deduped.bam chr22 | grep "A01664:83:HHTWYDMXY:2:2434:20672:19977_TAAGTCCTGT"

A01664:83:HHTWYDMXY:2:2434:20672:19977_TAAGTCCTGT 99 chr22 22734872 60 100M34S = 22734872 100 GTGACGTTGGTAGTTATAACCGTGTCTCCTGGTACCAGCAGCCCCCAGGCACAGCCCCCAAACTCATGATTTATGAGGTCAGTAATCGGCCCTCAGGGGTTTGTGCTGTACAAATACAAGCTCCTGCCACGGTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 MC:Z:100M34S AS:i:100 XS:i:72 RG:Z:D2758-8 SA:Z:chrX,43744471,+,95S39M,60,0; A01664:83:HHTWYDMXY:2:2434:20672:19977_TAAGTCCTGT 147 chr22 22734872 60 100M34S = 22734872 -100 GTGACGTTGGTAGTTATAACCGTGTCTCCTGGTACCAGCAGCCCCCAGGCACAGCCCCCAAACTCATGATTTATGAGGTCAGTAATCGGCCCTCAGGGGTTTGTGCTGTACAAATACAAGCTCCTGCCACGGTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 MC:Z:100M34S AS:i:100 XS:i:72 RG:Z:D2758-8 SA:Z:chrX,43744471,-,95S39M,60,0; A01664:83:HHTWYDMXY:2:2434:20672:19977_TAAGTCCTGT 147 chr22 22734872 60 100M34S = 22734872 -100 GTGACGTTGGTAGTTATAACCGTGTCTCCTGGTACCAGCAGCCCCCAGGCACAGCCCCCAAACTCATGATTTATGAGGTCAGTAATCGGCCCTCAGGGGTTTGTGCTGTACAAATACAAGCTCCTGCCACGGTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 MC:Z:100M34S AS:i:100 XS:i:72 RG:Z:D2758-8 SA:Z:chrX,43744471,-,95S39M,60,0; `

As you can see, the second read is present twice in the bam produced by umicollapse.

If i do the same grep on the input bam before umicollapse i dont see the duplicate read:

`devarea@sx253:~/karl/PathoCromwell/cromwell-executions/Agilent_Exome_Single/43dccd69-b489-4c7b-b92b-bd9d00490356/call-AgilentDedup/execution$ samtools view ../inputs/816700548/D2758-8.bam chr22 | grep "A01664:83:HHTWYDMXY:2:2434:20672:19977_TAAGTCCTGT"

A01664:83:HHTWYDMXY:2:2434:20672:19977_TAAGTCCTGT 99 chr22 22734872 60 100M34S = 22734872 100 GTGACGTTGGTAGTTATAACCGTGTCTCCTGGTACCAGCAGCCCCCAGGCACAGCCCCCAAACTCATGATTTATGAGGTCAGTAATCGGCCCTCAGGGGTTTGTGCTGTACAAATACAAGCTCCTGCCACGGTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 MC:Z:100M34S AS:i:100 XS:i:72 RG:Z:D2758-8 SA:Z:chrX,43744471,+,95S39M,60,0; A01664:83:HHTWYDMXY:2:2434:20672:19977_TAAGTCCTGT 147 chr22 22734872 60 100M34S = 22734872 -100 GTGACGTTGGTAGTTATAACCGTGTCTCCTGGTACCAGCAGCCCCCAGGCACAGCCCCCAAACTCATGATTTATGAGGTCAGTAATCGGCCCTCAGGGGTTTGTGCTGTACAAATACAAGCTCCTGCCACGGTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 MC:Z:100M34S AS:i:100 XS:i:72 RG:Z:D2758-8 SA:Z:chrX,43744471,-,95S39M,60,0; devarea@sx253:~/karl/PathoCromwell/cromwell-executions/Agilent_Exome_Single/43dccd69-b489-4c7b-b92b-bd9d00490356/call-AgilentDedup/execution$ `

Any idea what might have caused that ? regards, KK

karlkashofer avatar Dec 10 '22 18:12 karlkashofer

Hm, that is definitely weird. I don't know what's wrong off the top of my head. What command did you run for UMICollapse? Does this happen for other bam records?

Daniel-Liu-c0deb0t avatar Dec 11 '22 05:12 Daniel-Liu-c0deb0t