atropos
atropos copied to clipboard
Add option to overwrite reads with nextseq low quality
One consequence of the recommended op-orders CGQAW and GAWCQ is that garbage reads may end up being trimmed to lengths of 0 or shorter than the provided window size in the --overwrite-low-quality option. Two example failure modes of completely unusable reads sequenced with Illumina NextSeq and NovaSeq come to mind in paired end sequencing for an unusable read 2:
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
################################
and
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
The function for overwriting reads of low quality only does so if the read is of length longer than the required window to measure quality scores. One way to remedy this is to change the --op-order flag to run the "overwrite poor quality reads" first. However, this only will overwrite reads with the former case, and not the latter. Two ways to address this are:
- Adding option to overwrite reads with discrepancies in length, to allow quality trimming to occur before read overwriting while gracefully handling the case where the low quality read is shorter than the required window for quality measurement while the high quality read is longer (in the cases I outlined below, checking for read pairs in which a single read was trimmed entirely is sufficient).
- Allowing the
--overwrite-readsoption to treat Gs as low quality. this may not work well because the--overwrite-readsoption looks at the beginning of the read, which can have Gs with high PHRED scores that are from the DNA template and not artifacts of sequencing.
Thanks for this report. On first glance, I like option 1 but I'll need to consider it a bit.
In the meantime, a workaround is to write the orphaned reads to a separate file and align them separately.