gem3-mapper
gem3-mapper copied to clipboard
Supplementary hits
Howdy, thanks for your great work on GEM! I'm wondering if there's a way to get GEM to find / emit more supplementary hits with accompanying SA tag. This is important for some structural variant callers (e.g. Manta) to do their work. I see some reads get the XA tag with a similar format, but I think that's a little different.
Thanks!
Hi,
Option "--max-reported-matches|-M
./bin/gem-mapper --max-reported-matches 100
Let us known. Cheers,
Thanks! I just tried that, but it's still not finding any supplementary hits for some of these reads, compared to BWA, at least). The reads do have a bunch of softclipped bases and relatively low mapping quality score, but no SA or XA tags.
Update: After scanning the whole BAM, I'm actually not finding any reads as being flagged as supplementary (according to samtools view -f 0x800..), maybe I'm doing something wrong? The command I'm executing is:
gem-mapper -I human_g1k_v37_decoy_phiXAdaptr.gem -M 100 -1 trimmed-pair1.fastq -2 trimmed-pair2.fastq -t 12 -r '"@RG ID:None PL:ILLUMINA LB:None SM:None PU:None"' | samtools sort -@ 12 -O BAM -o sorted.bam
Ok.
Then, let's try to push the mapper a little bit further:
gem-mapper [...] --mapping-mode=sensitive
Certainly, this will generate more alt-matches. Don't forget to include '--max-reported-matches 100' or similar. Otherwise, I can have a look myself until the tool floods you with alt-mappings.
Cheers,
Hi again, I tried adding "--mapping-mode=sensitive" (and retaining -M 100), but I'm still not seeing any reads with the "supplementary" (0x800) flag set, or that have the SA tag, for the entire set of about 50M reads. This feels like it must be a bug (maybe mine?) - GEM should at least occasionally find reads with supplementary mappings, right?
It's weird. Nonetheless, if you can share index/reference (link to .fa file) and just 1 sequence. I can rerun and try to see what is the problem.
Thanks for looking into this, I'm aligning to the human genome reference (hg19), here's an example of a read pair that BWA thinks should have a supplementary hit, but doesn't seem to get one:
@A00576:54:HCTG7DRXX:2:2260:31467:35697:CTGTA+CAACT CTTACCAAACTCTAAATTTTCTCTTGGAAACTCCCATTTGAGATCATATTCATATTCTCTGAAATAAACGTAGAAGTACTCATTATCTGAGGAGCCTCTCTTGCCAAACTCTAAATTTTCTCTTGGAAACTCCCATTTGATATCAT + ADC@BBACBBE:EACBADCBBDBDD+CBCCB?CCCB:.DEBDBBCBBABD9@BA@DCEC1DBCC9CCB;,A+BC+BABE,BBDAB,??+6BD:CACEBED9-,@87A1,=>99+B-;BD,B.*8ABB6B,8+A?C;9+++:,8?
..and its pair: @A00576:54:HCTG7DRXX:2:2260:31467:35697:CTGTA+CAACT GAAGTACTCATTATCTGAGGAGCCGGTCACCTGTACCATCTGTAGCTGGCTTTCATACCTAAATTGCTTCAGAGATGAAATGATGAGTCAGTTAGGAATAGGCAGTTCTGCAGATAGAGGAAAGAATAATGAATTTTTACCTTTGC + ,=E++BECFC.B@BECDEDCFDD;CCCFCD;BBBC:FBCDCCBEEEC;DECDC:CBC+EBDE,DCE,C,EFDFDCCD>ECCCBBCECBFFB<BFDCDBBEDDFFCDCDCDEECBAECECCDDEDDB@DBACDBCBBBAACCCBACA
...not sure if this particular read pair will illuminate the issue (maybe GEM is missing this one for legitimate reasons), but I can supply a bunch more if needed. Thanks again!
Hi,
For the given sequence, they both seem to report the same results. The only difference is that BWA-MEM uses the SA tag and GEM the XA tag.
BWA-MEM => Primary chr13:28608214:96M50S and suplementary SA:Z:chr13,28608218,+,103S43M,60,1; Primary chr13:28608286:146M
GEM => Primary chr13:28608214:96M50S and extra-alignment XA:Z:chr13,+28608213,98S48M,100; Primary chr13:28608286:146M
Yes, you're totally right, maybe that wasn't the greatest example. Below I pasted a read pair for which the XA tag isn't present at all. But the important point isn't that GEM might disagree sometimes with BWA (maybe GEM is more accurate and BWA is wrong!), it's that GEM never produces a read mapping with the SA tag or the supplementary flag set - not a single one in many of the alignments I've looked at. Is it expected that GEM doesn't produce supplementary hits at all?
Here's a read pair that doesn't have an XA tag (or SA, or supplementary bit set), but which gets a supplementary hit in BWA.
@A00576:54:HCTG7DRXX:1:2275:18656:16705:GTAGT+CGTGT
ACTCCCATTTGAGATCATATTCATATTCTCTGAAATCAACGTAGAAGTACTCATTATCTGAGGAGCCTCTCTTGCCAAACTCTAAATTTTCTCTTGGAAACTCCCATTTGAGATCATATTCATATTCTCTGAAATCAACGTAGAAG
+
AADBCBABCDEBD9BBAB@@CBAAABDCECEE9CCBCBCB;BADBCDBABEABBBABCEEADCBBCCECECEDEDCBCCBEC?ACCBDDABACECE6BCBBECBB@?@CD7B?ABA@>ACBA?@ACBDBEBAB?A@ABB:AADBB*
@A00576:54:HCTG7DRXX:1:2275:18656:16705:GTAGT+CGTGT
ACTCCCATTTGAGATCATATTCATATTCTCTGAAATCAACGTAGAAGTACTCATTATCTGAGGAGCCTCTCTTGCCAAACTCTAAATTTTCTCTTGGAAACTCCCATTTGAGATCATATTCATATTCTCTGAAATCAACGTAGAAG
+
AADBCBABCDEBD9BBAB@@CBAAABDCECEE9CCBCBCB;BADBCDBABEABBBABCEEADCBBCCECECEDEDCBCCBEC?ACCBDDABACECE6BCBBECBB@?@CD7B?ABA@>ACBA?@ACBDBEBAB?A@ABB:AADBB*
Hi,
(1) These 2 reads are, in fact, the same read duplicated. Is that what you meant? (2) Both BWA-MEM and GEM report essentially the same (using different tags). BWA seems to me a little bit redundant, but ok.
BWA Primary chr13 28608218 60 74S72M and suplementary SA:Z:chr13,28608243,+,67M79S,60,0; Primary chr13 28608243 60 67M79H and suplementary SA:Z:chr13,28608218,+,74S72M,60,0;
GEM Primary chr13 28608213 0 69S77M and extra-alignment XA:Z:chr13,+28608243,67M79S,79
(3) Supplementary hits
GEM never produces a read mapping with the SA tag or the supplementary flag set
Yes. GEM outputs these supplementary as extra-alignment. Different tag/format, just that.
Let me know your thoughts.
Thanks again for your help with this! So is it correct to say that GEM doesn't emit reads with the SA tag or supplementary bit set, but instead encodes the same info in the XA tag? That's totally fine, just wanted to make sure that I'm not missing some important info. I think some structural variant detection tools (e.g. Lumpy, Manta) specifically look for reads with SA tags + supplementary bit to trigger a search for SVs, so we are currently missing some SV calls when aligning with GEM compared to BWA. If it's easy to alter GEM's output to use the SA tag in place of XA it might improve out-of-the-box sensitivity a bit for these types of variants. Thanks again for your hard work on GEM - it really is much faster than BWA and yields similar sensitivity and somewhat better PPV for small variants in our data
Thanks again for your help with this! So is it correct to say that GEM doesn't emit reads with the SA tag or supplementary bit set, but instead encodes the same info in the XA tag?
Yes, precisely.
If it's easy to alter GEM's output to use the SA tag in place of XA it might improve out-of-the-box sensitivity a bit for these types of variants.
Yes, it is very easy to adapt. I tag this issue as a feature request.
Thanks for your support and nice words.