abra2 icon indicating copy to clipboard operation
abra2 copied to clipboard

Explosion on CIGARS using = and X operators

Open Lenbok opened this issue 6 years ago • 3 comments

I was trying out ABRA2 on a set of alignments produced by RTG, which use the = and X CIGAR operators to indicate match/mismatch instead of the generic M operator. These CIGARS make ABRA fail with:

java.lang.UnsupportedOperationException: Unhandled cigar operator: = in: 2195212 : 124= at abra.SAMRecordWrapper.getSpanningRegions(SAMRecordWrapper.java:203) at abra.Feature.findAllOverlappingRegions(Feature.java:126) at abra.ReAligner.processChromosomeChunk(ReAligner.java:314) at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21) at abra.AbraRunnable.run(AbraRunnable.java:20) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)

BTW, this is using the 2.12 release.

Lenbok avatar Nov 08 '17 01:11 Lenbok

Thanks for reporting this. Are you looking for the realigned reads to also use the = X format?

mozack avatar Nov 08 '17 18:11 mozack

= and X are nice for at-a-glance eyeballing mismatches, but I'm more concerned with having the tool function than whether the output CIGARs preserve the types of operators used. It would probably be fine if adjacent =XM elements were just merged to M up front.

Lenbok avatar Nov 08 '17 20:11 Lenbok

As a workaround I ran one set of mappings through a pre-processor that converted the CIGARS to use the legacy format, and abra seemed to run after I added the following patch:

diff --git a/src/main/java/abra/MultiSamReader.java b/src/main/java/abra/MultiSamReader.java
index e52db0e..9fa66a5 100644
--- a/src/main/java/abra/MultiSamReader.java
+++ b/src/main/java/abra/MultiSamReader.java
@@ -84,6 +84,7 @@ public class MultiSamReader implements Iterable<SAMRecordWrapper> {
                return (
                        (!read.getDuplicateReadFlag()) &&
                        (!read.getReadFailsVendorQualityCheckFlag()) &&
+                       (read.getReadLength() > 0) &&
                        (read.getMappingQuality() >= this.minMapqForAssembly || read.getReadUnmappedFlag()) &&
                        SAMRecordUtils.isPrimary(read));  // Was previously an id check, so supplemental / secondary alignments could be included
        }

This was required because occasionally one arm of a paired read was quality trimmed to length 0 and reported in the BAM as an unmapped alignment placed with respect to its aligned mate. Without this patch abra would try to assemble the zero length read (where getReadString() returned "*") and explode.

Lenbok avatar Nov 09 '17 00:11 Lenbok