sunbeam
sunbeam copied to clipboard
Why remove trimmed reads?
Hello,
I'm wondering if you could please explain why the default is to remove reads with adapters. My intuition is that it would be fine to keep reads that are trimmed as long as they are long enough. If I wanted to work around this behavior, I would have to modify the actual Snakemake rule, right?
Thanks very much, Lev
https://github.com/sunbeam-labs/sunbeam/blob/d0e29cd47490194c3c4a1753837e54df7b12db55/rules/qc/qc.rules#L89
Hi Lev, thanks for the issue! I'm not sure why this is the default behavior.
I feel like there's some intuitive part of this I'm missing (maybe for our purposes, if we see the adapter the reads are too short and we don't want them?), but this code is from before I started working on Sunbeam.
Tagging @eclarke for a hopefully quick clarification on why this is default--maybe we should have a config option for this so that users don't have to resort to editing the rules?
Thanks for the response! Looking forward to hearing any additional thoughts from @eclarke.
I've done some googling, and while I still am not sure why discarding trimmed reads is desirable, the --discard-trimmed option (known then as --discard) was added super early in the development of cutadapt (v0.5), even before the option to filter reads by minimum length (--minimum-length).
Also, from the cutadapt manuscript:
In some cases, finding adapters is a sign of contamination, and the reads containing them must be discarded entirely.
So, discarding trimmed reads is desired in at least some cases. I don't think it'll hurt to make this user-configurable, though. To-do list for me:
- [ ] Add config option to let users configure cutadapt's behavior with trimmed reads (e.g.
--discard-trimmed
,--discard-untrimmed
). - [ ] Add tests for different cutadapt behaviors (confirm reads with or without adapters are trimmed/discarded/kept)