decontam icon indicating copy to clipboard operation
decontam copied to clipboard

prevalence method figure; Prevalence (Negative Controls) on x-axis against Prevalence (True Samples) on y-axis?

Open marwa38 opened this issue 2 years ago • 1 comments

hello .. I went for the prevalence method; my plot using this command is different from the one in the tutorial (having just few as you can in the below-attached figures) could you please guide me into this?

contamdf.prev05 <- isContaminant(ps.decon, method="prevalence", neg="is.neg", threshold=0.5)
table(contamdf.prev05$contaminant)
# FALSE  TRUE 
# 2029    16

# Make phyloseq object of presence-absence in negative controls and true samples
ps.pa <- transform_sample_counts(ps.decon, function(abund) 1*(abund>0))
ps.pa.neg <- prune_samples(sample_data(ps.pa)$phase == "negative", ps.pa)
ps.pa.pos <- prune_samples(sample_data(ps.pa)$phase == "positive", ps.pa)
# Make data.frame of prevalence in positive and negative samples
df.pa <- data.frame(pa.pos=taxa_sums(ps.pa.pos), pa.neg=taxa_sums(ps.pa.neg),
                      contaminant=contamdf.prev05$contaminant)

ggplot(data=df.pa, aes(x=pa.neg, y=pa.pos, color=contaminant)) + geom_point() +
  xlab("Prevalence (Negative Controls)") + ylab("Prevalence (True Samples)")

df.pa.zip infosession: decontam_1.14.0

in the tutorial image

mine image

many thanks

marwa38 avatar Feb 27 '22 10:02 marwa38

I'm not sure what your question is?

benjjneb avatar Mar 01 '22 23:03 benjjneb

ops .. sorry I missed your answer.. my figure didn't show a similar pattern as in decontam tutorial (2 figures attached up in the post). I am not sure I got what the figure want to say what do you think? Could you please comment Many thanks @benjjneb

marwa38 avatar Nov 03 '22 11:11 marwa38

Your figure shows such a small number of samples (3 negative controls, 2 real samples) that I don't think that decontam is even a useful tool. You'll need to develop some sort of ad hoc approach to removing contaminants (e.g. removing everythign that appears in >2 negative controls), or return to this when you have your full dataset.

benjjneb avatar Nov 03 '22 14:11 benjjneb

thanks for your reply @benjjneb what do you mean with real samples? my actual samples or samples that or not contaminated? here is sample number image

marwa38 avatar Nov 03 '22 14:11 marwa38

Your previous figure shows a maximum "Prevalence (True Samples)" of 2, and a maximum "Prevalence (Negative Controls)" of 3. So, either that figure is plotted incorrectly, or there is nearly no overlap between the taxa found in various samples.

benjjneb avatar Nov 03 '22 14:11 benjjneb

I double-checked and yeah I chose for ps.pa.pos the positive controls and not the true samples that is why do you think it is a good practice to include the positives as true samples or better not to consider? but at the end I got the same only three taxa were removed

image

marwa38 avatar Nov 03 '22 19:11 marwa38

From that plot, it looks like almost none of your taxa are also found in your negative controls. So, that seems good.

do you think it is a good practice to include the positives as true samples or better not to consider?

I think it is fine to have the positive samples included with the true samples for the purpose of "prevalence" testing.

benjjneb avatar Nov 03 '22 23:11 benjjneb