decontam icon indicating copy to clipboard operation
decontam copied to clipboard

The best threshold and other questions about Decontam

Open kla44 opened this issue 3 years ago • 1 comments

Hi ! First of all, I would like to thank you for your Decontam tool, which I am very interested in.

By the way, I tested Decontam on data that I have, with the so-called frequency method, and I have a few questions to ask you.

  • First, it's about de threshold. I have done an histogram and a scatterplot to find the best threshold to use. There is many possible threshold but we can't agree on the best threshold. So we wanted to ask you what threshold you would choose based on these charts ?

image

image

  • And, to your knowledge, are there any automated methods for this threshold selection ?

  • Another question was then asked, in order to know on which taxonomic level we should work. In fact, it is recommended to use the lowest taxonomic level, here it's the OTU. But, we wondered if it was not more relevant to work on Genus rather than on OTU ?

  • And if we have to work on OTU, is it relevant to have a low prevalence for some taxa/OTU (2 of 100) in our results ?

  • Moreover, concerning our DNA concentration range, do you think this range is wide enough with this type of summary (DNA concentrations are in ng/µL) ?

image

  • And, last question, do you recommend applying filters to our data before using Decontam ? And if so, what are they ?

I thank you in advance for your help !

kla44 avatar Apr 30 '21 08:04 kla44

There is many possible threshold but we can't agree on the best threshold. So we wanted to ask you what threshold you would choose based on these charts ?

Is it more important for your analysis to be sure to have removed likely contaminants? Or do you want to be more "conservative" in contaminant removal, and only remove the ASVs/OTUs that are definite contaminants? Once you have the answer to that question, you will know which threshold is the best choice.

And, to your knowledge, are there any automated methods for this threshold selection ?

No, because the optimal threshold depends on the answer to that question above (i.e. the balance between sensitivity vs. specificity in this classification problem that is best for your downstream applications).

Another question was then asked, in order to know on which taxonomic level we should work. In fact, it is recommended to use the lowest taxonomic level, here it's the OTU. But, we wondered if it was not more relevant to work on Genus rather than on OTU ?

More specific is better. Contaminants and real taxa can share a genus, but are less likely to have the exact same ASV.

And if we have to work on OTU, is it relevant to have a low prevalence for some taxa/OTU (2 of 100) in our results ?

That is fine at this stage. You may consider dropping low prevalence OTUs/ASVs at some other step though.

Moreover, concerning our DNA concentration range, do you think this range is wide enough with this type of summary (DNA concentrations are in ng/µL) ?

It's wide enough, but a wider concentration would make frequency-based contaminant classification more powerful and accurate.

And, last question, do you recommend applying filters to our data before using Decontam ? And if so, what are they ?

You might want to apply other filters before or after decontam. But fundamentally decontam is operating on a per-feature basis, so it isn't really affected by filtering of features.

benjjneb avatar May 03 '21 18:05 benjjneb