decontam
decontam copied to clipboard
all FALSE
decontam_data.xlsx 01decontam.txt I have used your software in my data,but get a error result.I would give you my data and code ,could you help me find out the reason? The data file is decontam_data.xlsx. This file have 5 columns, are species, true or false(contamination), sample1, sample2, control. The code file is 01decontam.txt. Every time I use a sample and control th identify contanmination and get a wrong result that all species were not contaminaton. Thanks for your reply !
Pleas directly use the GH comment facilities to show your code:
# Witin the ``` fences
Right now it is still hard to understand your problem. Thank you for providing files to help though -- they do help.
I am sorry for not describing my problem very well ,now I would annotate my code . Because that I got wrong result , please help me check my code or tell me why it could't work .
##########
library(phyloseq)
library(ggplot2)
library(decontam)
library(readxl)
OTU<-read_excel("decontam_data.xlsx" ,sheet="TRUE")
row=as.list(OTU$Species)
col=colnames(OTU)
col1=c("truex","NC","Species")
sample_col=setdiff(col, col1)##we have two sample ,get the sample columns
result=data.frame(true=OTU$truex) ##create a new result dataframe
#######for every sample ,identify contamination
for (sample in sample_col){
otu = as.matrix(OTU[,c(sample,"NC")]) ##otu marix have two columns,one is sample, another is control
rownames(otu)=as.list(OTU$Species)
colnames(otu)=c(sample,"NC")
outmat= otu_table(otu, taxa_are_rows = TRUE)
sampledata = sample_data(data.frame(Sample_or_Control=c("True Sample","Control Sample"),row.names=c(sample,"NC")))
physeq1<-phyloseq(outmat,sampledata)
sample_data(physeq1)$is.neg <- sample_data(physeq1)$Sample_or_Control == "Control Sample"
contamdf.prev <- isContaminant(physeq1, method="prevalence", neg="is.neg")
write.table(contamdf.prev , file=paste(as.character(sample),"_result.txt",sep="") , sep ="\t", row.names =TRUE, col.names =TRUE, quote =FALSE) ##write the sample result to a file
result[[sample]]=contamdf.prev$contaminant ##select the contamination columns, and add to the result dataframe
}
write.table(result, file="result_decontam.txt", sep ="\t", row.names =TRUE, col.names =TRUE, quote =FALSE)
I notice that prevalence method is based on chi-square method, then for my otu data , I would know how it work . I have two idea. One is chi contingency test . For example , for a species "IC002", calculate the reads number of sample ,control,all species of sample ,all species of control to contruct a contingency table as below and do contingency test . Species | sample | control IC002 | 6,914 | 10 all | 5061210 | 2356923 Another one is chi-square goodness of fit test ,calculate the reads number of sample, reads number of control divide by all reads number of control , reads number of all species of sample ,1 and do the test., as below Species | sample | control IC002 | 6,914 | 0.001366 all | 5061210 | 1 ############################# Do you think my idea is right? which one is better? In your tools - decontam, to take chi square test for a out table, is this done?
If I am reading your input data correctly, you have one negative control sample and two postive samples (Zymo mocks).
decontam isn't really intended to be used on such a small number of samples. The power to accurately call contaminants really requires larger sample numbers, in the paper we suggest 5 negative controls (or more), and at least as many real samples.
I notice that prevalence method is based on chi-square method, then for my otu data , I would know how it work . I have two idea. One is chi contingency test . For example , for a species "IC002", calculate the reads number of sample ,control,all species of sample ,all species of control to contruct a contingency table as below and do contingency test .
Note that the "prevalence" method creates a presence/absence contingency table. So the entires in the table are the number of samples in which a taxon was present, not overall abundance across all those samples.
Thank you for all your assistance.
发自我的iPhone
------------------ Original ------------------ From: Benjamin Callahan @.> Date: Wed,Apr 7,2021 11:28 PM To: benjjneb/decontam @.> Cc: wangqiqi1995 @.>, Author @.> Subject: Re: [benjjneb/decontam] all FALSE (#90)
If I am reading your input data correctly, you have one negative control sample and two postive samples (Zymo mocks).
decontam isn't really intended to be used on such a small number of samples. The power to accurately call contaminants really requires larger sample numbers, in the paper we suggest 5 negative controls (or more), and at least as many real samples.
I notice that prevalence method is based on chi-square method, then for my otu data , I would know how it work . I have two idea. One is chi contingency test . For example , for a species "IC002", calculate the reads number of sample ,control,all species of sample ,all species of control to contruct a contingency table as below and do contingency test .
Note that the "prevalence" method creates a presence/absence contingency table. So the entires in the table are the number of samples in which a taxon was present, not overall abundance across all those samples.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.