decontam icon indicating copy to clipboard operation
decontam copied to clipboard

Difference between IsContaminant and isNotContaminant

Open Jewelna opened this issue 4 years ago • 4 comments

I have recently read your paper very useful paper in Microbiome about identification of contaminants in metagenome sequence data.

I would be very grateful if you could clarify for me what the difference is in using isContaminant and isNotContaminant . I have tried both on my samples and they give different results. One identifies some ASVs as contaminants while the other does not.

Do I use the 'p' value at <0.05 to select which of the identified contaminants have been correctly identified?

Jewelna avatar Sep 06 '19 20:09 Jewelna

The key difference is that isContaminant is identifying contaminants "conservatively", in the sense that it requires sufficient positive evidence that something is a contaminant before calling it such. isNotContaminant flips the burden of proof on its head, and requires sufficient positive proof an ASV is not a contaminant before calling it so. So its default assumption is that everything is a contaminant until proved otherwise.

Our guidance is that in most use cases you should use isContaminant, unless your samples are so low biomass that the concentration of contaminating DNA is expected to be as high or higher than true sample DNA (e.g. in near sterile environments like placenta samples).

benjjneb avatar Sep 09 '19 15:09 benjjneb

Thanks. That is very helpful. I have also realised that some features that are expected in my test samples are being identified as contaminants. They are however not present in the negative samples. Do I therefore accept them as contaminants? I am a bit confused

Jewelna avatar Sep 10 '19 14:09 Jewelna

Are you using isContaminant? If so that shouldn't be happening.

Are you using isNotContaminant? If so that can be expected if there is insufficient evidence (replication) of those taxa in your true samples. In that case, I would reconsider whether you actually want to be using isNotContaminant -- that method is really inteded for extremely low biomass samples only.

benjjneb avatar Sep 10 '19 15:09 benjjneb

I was using isNotContaminant initially; I guess that was the reason. Got it right now with the isContaminant. Results make more sense with respective to my data.

Thank you

Jewelna avatar Sep 10 '19 15:09 Jewelna