strobealign icon indicating copy to clipboard operation
strobealign copied to clipboard

CLI option to always include SEQ/QUAL in SAM output

Open sjaenick opened this issue 6 months ago • 2 comments

Hi there,

currently, strobealign (0.16.1, with -N 5) does not include SEQ/QUAL fields for non-primary alignments.

https://github.com/ksahlin/strobealign/blob/main/src/sam.cpp#L190 https://github.com/ksahlin/strobealign/blob/main/src/sam.cpp#L199

This is perfectly fine and correct with regard to the SAM spec.

However, it became a problem for me when filtering the alignments vs. a set of regions defined via a BED file (i.e., samtools view -L genes.bed). If the primary alignment is located outside of the defined regions, and one or several secondary alignments are within them, only the secondary alignments are retained. As a consequence, none of the alignments for a read has SEQ/QUAL in the filtered output file.

(I'll open a separate issue for samtools to discuss whether SEQ/QUAL should be transferred from primary to (at least the first) secondary alignment in this situation.)

Would you consider adding a new CLI parameter to include SEQ/QUAL in secondary alignments?

sjaenick avatar Jun 25 '25 13:06 sjaenick

I’d tend to say that this sounds more like something that should be solved within samtools and not within each individual read mapper. Is there precedence for such an option in other read mappers?

An advantage of doing this in the read mapper would be that it is very easy at that stage because all alignments for a single read are still grouped together. If it’s done on a sorted BAM file, you’d need to look up the primary alignment whenever you encounter a secondary alignment without SEQ/QUAL. I’m not totally against this, but I’d like to read the discussion on the samtools issue tracker first.

Or another idea: There could be a separate tool to do this that you pipe the output from the read mapper into.

Or even better: Because the typical way of using strobealign is to pipe its output into samtools sort (strobealign ... | samtools sort -o sorted.bam), this could become an option in samtools sort instead, so you would have sth. like

strobealign ... | samtools sort --copy-primary-seq-and-qual-to-secondary -o out.bam

marcelm avatar Jun 26 '25 12:06 marcelm

I tend to agree. https://github.com/samtools/samtools/issues/2232 is the corresponding samtools issue.

sjaenick avatar Jun 26 '25 13:06 sjaenick