abra2 icon indicating copy to clipboard operation
abra2 copied to clipboard

Supplementary alignment behavior question

Open dstreett opened this issue 5 years ago • 1 comments

Thanks for the software, it has been incredibly helpful. Hopefully, this question is not a repeat and this is an appropriate place to ask this questions.

I’m curious about some behavior I’m seeing in regards to supplementary alignments.

First case is a fairly standard case (input bam: out.perfect.bam output: abra.perfect.bam). We have a perfect alignment from hg19, chr12 start=12043874 stop=12044029 except I put a 34 bp insertion into this region 87 bases in. So, the “perfect” cigar is 87M34I35M, however, after alignment I get primary cigar: 87M69S and supplementary cigar: 121H35M. After running abra, I get primary cigar 87M34I35M (perfect, exactly what we expect), but the supplementary cigar is (unmodified)121H35M.

My first question is why is the supplementary read kept? It seems since the primary read was fixed and overlaps with this supplementary read that this supplementary read should be removed completely. Is there a flag that would produce this removal behavior or is this expected behavior?

Second case is a bit more odd (input bam: out_noise.bam output: abra.noise.bam). The input is approximately the same, (34 length insertion), but there is some noise on the two sides (primary cigar: 37M3D47M73S, supplementary cigar: 118H14M2D25M). The primary alignment again is exactly what I would expect 37M3D47M34I14M2D25M (perfect), however, the supplementary alignment the hard clips were trimmed off 14M2D25M, which I found odd.

Why would Abra2 modify the supplementary cigar modified in this way? It seems it should have been modified like this in case 1 as well or just removed.

This analysis was ran with very vanilla options (java -Xmx6G -jar /usr/bin/abra2.jar --in out.perfect.bam --out abra.perfect.bam --ref /usr/share/archer/reference/hg19/hg19.fa --threads 1 --tmpdir tmp), so I might just be missing a parameter. Attached are the input/output bams. Let me know if any logs/additional information would be useful.

Thank you for your help and your software.

abra_inputs_and_outputs.tar.gz

[root@5e9b723c8cbd tests]# java -Xmx6G -jar /usr/bin/abra2.jar
INFO    Tue Mar 12 18:21:24 UTC 2019    Abra version: 2.19

dstreett avatar Mar 12 '19 18:03 dstreett

Thanks for reporting this. At present, supplementary alignments are eligible for realignment just like other reads - and this may not be the best behavior.

The removal of hard clips looks like a bug. We should probably not be trying to remap hard clipped reads in the first place.

As for resolving supplemental reads in cases where the primary alignment can be corrected into a single alignment, that will not be a small fix as supplemental reads can be mapped to loci far away from the primary alignment and ABRA2 only processes regions locally. We may be able to provide a post-hoc cleanup step for these cases if that is of interest.

mozack avatar Mar 15 '19 13:03 mozack