decontam icon indicating copy to clipboard operation
decontam copied to clipboard

Negative controls not separating out

Open hpremathilake opened this issue 4 years ago • 2 comments

I used the "prevalence" method to filter out contaminants. Everything worked fine but the when I graph Prevalence (true samples) aganst Prevelance (Negative Controls) the data does not get properly divided as you have demonstrated in https://benjjneb.github.io/decontam/vignettes/decontam_intro.html#identify-contaminants---prevalence

The graph I get is attached. Would you be able to please clarify whether this is an indication of a successful prevalence based decontamination or not?

Thank you, Best Rgds, Hasitha.

Decontam_Graph1 Decontam_Graph2

hpremathilake avatar Sep 06 '19 17:09 hpremathilake

What this suggests is that there is not a clear separation of all contaminants from non-contaminants in this dataset, at least based on prevalence patterns. This does not mean that contaminant classification didn't work, but it does suggest that it will not be perfect and that some contaminants probably will still be present in the data.

Questions as to why this might be in your data: How low biomass are the samples you are working with? Is it possible that contaminant DNA may be as abundant as sample DNA in the true samples?

A second thing to consider, is that the plot here is not as informative as it could be because a lot of points are being overplotted other points. You may want to try using a geom_jitter graphical layer to show the "cloud" of points at each position. It may also be worth considering just the most abundant, say, 10% ASVs. In our data those often are most clearly separated, which is a good thing as those are the most likely to impact subsequent analysis.

benjjneb avatar Sep 09 '19 15:09 benjjneb

I'm working on Fecal and Rumen samples of cattle, hence in terms of biomass it should not be that low. Total genomic content of those samples can range from 40ng/µl to 200ng/µl. But of-cause the actual bacterial DNA content would be much lower since the genomic DNA extraction would contain host and plant DNA as well. I highly doubt that contaminant DNA could be as abundant as true sample DNA since I have used autoclaved molecular grade water as the blank and has rarely seen a total genomic DNA concentration above 5ng/µl for any of the negative controls I ran (unless otherwise the kit reagents are heavily contaminated, which is unlikely).

I will try the suggested enhancement of using the most abundant 10% of ASVs and see whether it improves the separation. Thank you very much.

hpremathilake avatar Sep 09 '19 22:09 hpremathilake