decontam
decontam copied to clipboard
prevalence method figure; Prevalence (Negative Controls) on x-axis against Prevalence (True Samples) on y-axis?
hello .. I went for the prevalence method; my plot using this command is different from the one in the tutorial (having just few as you can in the below-attached figures) could you please guide me into this?
contamdf.prev05 <- isContaminant(ps.decon, method="prevalence", neg="is.neg", threshold=0.5)
table(contamdf.prev05$contaminant)
# FALSE TRUE
# 2029 16
# Make phyloseq object of presence-absence in negative controls and true samples
ps.pa <- transform_sample_counts(ps.decon, function(abund) 1*(abund>0))
ps.pa.neg <- prune_samples(sample_data(ps.pa)$phase == "negative", ps.pa)
ps.pa.pos <- prune_samples(sample_data(ps.pa)$phase == "positive", ps.pa)
# Make data.frame of prevalence in positive and negative samples
df.pa <- data.frame(pa.pos=taxa_sums(ps.pa.pos), pa.neg=taxa_sums(ps.pa.neg),
contaminant=contamdf.prev05$contaminant)
ggplot(data=df.pa, aes(x=pa.neg, y=pa.pos, color=contaminant)) + geom_point() +
xlab("Prevalence (Negative Controls)") + ylab("Prevalence (True Samples)")
df.pa.zip
infosession: decontam_1.14.0
in the tutorial
mine
many thanks
I'm not sure what your question is?
ops .. sorry I missed your answer.. my figure didn't show a similar pattern as in decontam tutorial (2 figures attached up in the post). I am not sure I got what the figure want to say what do you think? Could you please comment Many thanks @benjjneb
Your figure shows such a small number of samples (3 negative controls, 2 real samples) that I don't think that decontam is even a useful tool. You'll need to develop some sort of ad hoc approach to removing contaminants (e.g. removing everythign that appears in >2 negative controls), or return to this when you have your full dataset.
thanks for your reply @benjjneb
what do you mean with real samples? my actual samples or samples that or not contaminated?
here is sample number
Your previous figure shows a maximum "Prevalence (True Samples)" of 2, and a maximum "Prevalence (Negative Controls)" of 3. So, either that figure is plotted incorrectly, or there is nearly no overlap between the taxa found in various samples.
I double-checked and yeah I chose for ps.pa.pos the positive controls and not the true samples that is why do you think it is a good practice to include the positives as true samples or better not to consider? but at the end I got the same only three taxa were removed
From that plot, it looks like almost none of your taxa are also found in your negative controls. So, that seems good.
do you think it is a good practice to include the positives as true samples or better not to consider?
I think it is fine to have the positive samples included with the true samples for the purpose of "prevalence" testing.