binned quality scores in Illumina reads (paired ends)
Hi @marcelm,
this is a long time issue in other software, how deal with quality binned scores.
Cutadapt has an official solution on how to deal with the QC of sequencing files with binned quality scores?
Cutadapt doesn’t have any special treatment for binned quality scores. I don’t know what to do about it to be honest, did you mean anything in particular? I’d guess the only thing to keep in mind is that the threshold for -q (quality trimming) needs to be chosen with the bins in mind. For example, if you use -q 10 and increase it to -q 15, hoping that it will trim more, then nothing will actually change if you have a bin that covers 10-19.
Not really. Currently, this can cause problems for some algorithms that use quality scores to predict errors or construct error models (e.g. ASV in 16S amplicons). I think bearing in mind the bins is a good strategy, but the general quality of the reads will always mask a low- or ultra-low-quality point where the rest is good. Of course, it depends a lot on the final aim and the question driving the sequencing, but I can see that this could potentially cause problems. Maybe I'm concerned too much, but I'd like to hear your perspective and find out if Cutadapt has an official solution.
To clarify:
- Cutadapt does not use quality scores when finding adapters.
- The only Cutadapt options that use quality values are, as far as I am aware, the following ones:
-q,-Q,--nextseq-trim,--max-expected-errors, and--max-average-error-rate. The first three trim reads and the consideration I mentioned above regarding the-qthreshold holds for them. The latter two filter reads based on quality values, and I expect that binned quality values are not a big problem because they compute an average over all quality values in the read. - Cutadapt does not modify quality values; it only ever removes them (those that belong to bases that it trims). So whatever problematic binning goes in comes out in the same way.
I don’t know what else to say, but I’m happy to discuss if you find a specific problem that is caused by binned quality values and their interaction with something in Cutadapt.
@marcelm thank you for the input. I'm aware of this, of course. In some pipelines this is used for trimming with the -q value. Here, of course, I see your point. That was just to understand if on your side you think that there is something to worry in relation to what Cutadapt have been developed for, but as far as I understood this should not be interfering with adapter/primer removal.
Thank you again.
Exactly, I don’t think binned quality values matter for Cutadapt. Maybe it’s worth documenting this – please leave the issue open for now and I’ll see whether I can add a short paragraph to the documentation.