decontam icon indicating copy to clipboard operation
decontam copied to clipboard

Didn't find contaminants how to proceed?

Open JoaoMiranda96 opened this issue 2 years ago • 4 comments

Hi,

I'm new in microbiome analysis and R. I have low biomass samples from mosquito's midgut. I have two negative controls (one blank from DNA extraction, and one from PCR) that presented reads. So I decided to use decontam to filter this contaminants from my samples. Following the tutorial I decided to use the prevalence method, but for my surprise after I use the code isContaminant the output was all my 4926 ASVs in FALSE. So I tried to set the threshold for 0.5 , 0.6 , 0.7... And the result is the same: all 4926 ASVs in FALSE. I read this issue https://github.com/benjjneb/decontam/issues/90 and maybe I have a similar case, because I just have two negative controls. So I need help to check if I done something wrong in my code, if not how can I filter my samples without decontam? Also I'm using qiime2 for my microbiome analysis.

Here is my code:

`library(qiime2R) library(phyloseq) library(ggplot2) library(decontam) ASVs<-read_qza(file.choose()) ASVs$data[1:5,1:5] #show first 5 samples and first 5 taxa names(ASVs) metadata<-read_q2metadata(file.choose()) head(metadata) physeq<-qza_to_phyloseq(features = file.choose(),metadata = file.choose()) physeq head(sample_data(physeq))

df <- as.data.frame(sample_data(physeq)) # Put sample_data into a ggplot-friendly data.frame df$LibrarySize <- sample_sums(physeq) df <- df[order(df$LibrarySize),] df$Index <- seq(nrow(df)) ggplot(data=df, aes(x=Index, y=LibrarySize, color= sample_or_control)) + geom_point() ??decontam

sample_data(physeq)$is.neg <- sample_data(physeq)$sample_or_control == "Control Sample" contamdf.prev <- isContaminant(physeq, method="prevalence", neg="is.neg") table(contamdf.prev$contaminant) contamdf.prev05 <- isContaminant(physeq, method="prevalence", neg="is.neg", threshold=0.5) table(contamdf.prev09$contaminant)`

Thanks in advance!

JoaoMiranda96 avatar Oct 17 '21 23:10 JoaoMiranda96

Two negative controls isn't really enough, and you should probably consider methods other than decontam for this data.

That said, that every ASV is showing up as non-contaminant is somewhat surprising. Two things I would look at:

What is the read depth of the two control samples? And for comparison, the distribution of read depths in the real samples?

What is the distribution of decontam scores, e.g. hist(contamdf.prev$p, n=100)

Ok, third thing, what kind of controls are these? For example, did they also go through DNA extraction, or were they added later in the process?

benjjneb avatar Oct 17 '21 23:10 benjjneb

Hi @benjjneb thank you a lot for the prompt respond!

1 - the control from PCR had 4.709 reads and the blank control had 23.196. The mean of reads in the real samples was 43.430.

2- I runned your code in R and the output was: head(contamdf.prev$p, n=100) [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [32] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [63] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [94] NA NA NA NA NA NA NA

3- The blank is the negative control that gone through all the DNA extraction, but with reagents instead of DNA. The other negative control is from the PCR, it was added after the DNA extraction.

Instead of decontam which other method you would recommend for my case?

I think this migth be helpful, my library size:

image

JoaoMiranda96 avatar Oct 18 '21 00:10 JoaoMiranda96

Your blank, i.e. your extraction control, has a read count in line with many of the real samples, and much higher than the non-extraction control. This suggests that DNA extraction is a major contributor to contamination (normal), and that contaminants could be a significant fraction of your data in many samples (not necessarily, but definitely could be).

With essentially one informative control sample, the options are much more limited for trying to remove contaminant taxa. I don't have any strong recommendations here. I'd probably explore how much overlap there is between the taxa found in the extraction control and the true samples, and then perhaps remove most if not all of the taxa present in the negative control. But I also would not expect this to fully solve contamination here.

benjjneb avatar Oct 19 '21 14:10 benjjneb

@benjjneb Thank you so much for your attention and recommendations.

JoaoMiranda96 avatar Oct 20 '21 17:10 JoaoMiranda96