quip icon indicating copy to clipboard operation
quip copied to clipboard

Fix mate sequence when reading sam/bam

Open jbedo opened this issue 3 years ago • 0 comments

Previously upon reading the case of tid == mtid was detected and the sequence name mapped to "=". This causes missing sequence name errors upon decompression. As the case of tid == mtid is handled during writing of sam/bam, this patch simply records the full mate sequence name, resolving the matching issues.

Example read after decompression pre patch:

SL1344_1_530_0:0:0_0:0:0_6c9    163     SL1344  1       60      70M     *       461     530     AGAGATTACGTCTGGTTGCAAGAGATCATGACAGGGGGAATTGGTTGAAAATAAATATATCGCCAGCAGC  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII       MQ:i:60 AS:i:70 RG:Z:mysample1  NM:i:0  MC:Z:70M        MD:Z:70 ms:i:2800       XS:i:0

and post patch:

SL1344_1_530_0:0:0_0:0:0_6c9    163     SL1344  1       60      70M     =       461     530     AGAGATTACGTCTGGTTGCAAGAGATCATGACAGGGGGAATTGGTTGAAAATAAATATATCGCCAGCAGC  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII       MQ:i:60 AS:i:70 RG:Z:mysample1  NM:i:0  MC:Z:70M        MD:Z:70 ms:i:2800       XS:i:0

jbedo avatar May 31 '22 06:05 jbedo