CLI option to always include SEQ/QUAL in SAM output
Hi there,
currently, strobealign (0.16.1, with -N 5) does not include SEQ/QUAL fields for non-primary alignments.
https://github.com/ksahlin/strobealign/blob/main/src/sam.cpp#L190 https://github.com/ksahlin/strobealign/blob/main/src/sam.cpp#L199
This is perfectly fine and correct with regard to the SAM spec.
However, it became a problem for me when filtering the alignments vs. a set of regions defined
via a BED file (i.e., samtools view -L genes.bed). If the primary alignment is located outside of the
defined regions, and one or several secondary alignments are within them, only the secondary
alignments are retained. As a consequence, none of the alignments for a read has SEQ/QUAL
in the filtered output file.
(I'll open a separate issue for samtools to discuss whether SEQ/QUAL should be transferred from primary to (at least the first) secondary alignment in this situation.)
Would you consider adding a new CLI parameter to include SEQ/QUAL in secondary alignments?
I’d tend to say that this sounds more like something that should be solved within samtools and not within each individual read mapper. Is there precedence for such an option in other read mappers?
An advantage of doing this in the read mapper would be that it is very easy at that stage because all alignments for a single read are still grouped together. If it’s done on a sorted BAM file, you’d need to look up the primary alignment whenever you encounter a secondary alignment without SEQ/QUAL. I’m not totally against this, but I’d like to read the discussion on the samtools issue tracker first.
Or another idea: There could be a separate tool to do this that you pipe the output from the read mapper into.
Or even better: Because the typical way of using strobealign is to pipe its output into samtools sort (strobealign ... | samtools sort -o sorted.bam), this could become an option in samtools sort instead, so you would have sth. like
strobealign ... | samtools sort --copy-primary-seq-and-qual-to-secondary -o out.bam
I tend to agree. https://github.com/samtools/samtools/issues/2232 is the corresponding samtools issue.