dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

Use of priors with ITS data

Open ramiroricardo opened this issue 1 year ago • 4 comments

Dear all,

we have an ITS dataset to analyse and there is a species of interest that we would like to be more sensitive in detecting. We can go into pooling, but we were also wondering about the use of priors and if it makes sense in ITS data.

In our case, we are analyzing only forward reads and using truncQ and maxEE, but not truncLen. This will lead to reads of variable length, in principle even for the exact same taxa. In #496, it is indicated that priors have to have the exact same length as the processed reads, which is fine for 16S data, but I am guessing won't work for ITS data. This would mean, we should not (yet) use priors with ITS data.

Am I missing something / is there a way in which priors can still be included when analyzing ITS data?

Thanks

ramiroricardo avatar Aug 10 '22 10:08 ramiroricardo

My first suggestions is to stop using truncQ to get rid of the artificial length variation that filtering technique introduces. It is almost never the right choice when analyzing data with DADA2.

If that is eliminated, then it becomes possible to use priors with ITS data, although one does have to be precise about setting the prior sequences to be of the same length as they will be in the data. However, if the target species ITS sequence is known, that should be achievable. Note that if working with paired-end data, you'll need to create priors separately for the forward and reverse reads.

benjjneb avatar Aug 10 '22 14:08 benjjneb

Thanks for your quick reply. Just a further question: we were using truncQ as it is recommended in the DADA2 ITS workflow. Thus why would it not be recommended? because we are using forward reads only?

ramiroricardo avatar Aug 10 '22 16:08 ramiroricardo

Hm, I see that there now. By default truncQ=2 is enforced anyway by filterAndTrim, as several variations of Cassava/Illumina would start assigning Q=2 when it no longer had any idea what the bases were. If you have just been using truncQ=2, I'm not too concerned actually as that should introduce relatively little length variation. But the way the tutorial is written is looks like truncQ=XX is a parameter we recommended potentially tuning, and that's not the case even for ITS data.

benjjneb avatar Aug 10 '22 16:08 benjjneb

Thanks for the reply. Indeed we started wondering about changing truncQ after seeing this paper, where several values are tested: https://insight.jci.org/articles/view/151663

ramiroricardo avatar Aug 13 '22 17:08 ramiroricardo