decontam icon indicating copy to clipboard operation
decontam copied to clipboard

all FALSE

Open wangqiqi1995 opened this issue 3 years ago • 5 comments

decontam_data.xlsx 01decontam.txt I have used your software in my data,but get a error result.I would give you my data and code ,could you help me find out the reason? The data file is decontam_data.xlsx. This file have 5 columns, are species, true or false(contamination), sample1, sample2, control. The code file is 01decontam.txt. Every time I use a sample and control th identify contanmination and get a wrong result that all species were not contaminaton. Thanks for your reply !

wangqiqi1995 avatar Apr 06 '21 03:04 wangqiqi1995

Pleas directly use the GH comment facilities to show your code:

# Witin the ``` fences

Right now it is still hard to understand your problem. Thank you for providing files to help though -- they do help.

benjjneb avatar Apr 06 '21 03:04 benjjneb

I am sorry for not describing my problem very well ,now I would annotate my code . Because that I got wrong result , please help me check my code or tell me why it could't work .

##########
library(phyloseq)
library(ggplot2)
library(decontam)
library(readxl)
OTU<-read_excel("decontam_data.xlsx" ,sheet="TRUE")
row=as.list(OTU$Species)
col=colnames(OTU)
col1=c("truex","NC","Species")
sample_col=setdiff(col, col1)##we have two sample ,get the sample columns
result=data.frame(true=OTU$truex) ##create a new result dataframe
#######for every sample ,identify contamination
for (sample in sample_col){
  otu     =       as.matrix(OTU[,c(sample,"NC")])  ##otu marix have two columns,one is sample, another is control
  rownames(otu)=as.list(OTU$Species)
  colnames(otu)=c(sample,"NC")
  outmat= otu_table(otu, taxa_are_rows = TRUE)  
  sampledata = sample_data(data.frame(Sample_or_Control=c("True Sample","Control Sample"),row.names=c(sample,"NC")))
  physeq1<-phyloseq(outmat,sampledata)
  sample_data(physeq1)$is.neg <- sample_data(physeq1)$Sample_or_Control == "Control Sample"
  contamdf.prev <- isContaminant(physeq1, method="prevalence", neg="is.neg") 
  write.table(contamdf.prev , file=paste(as.character(sample),"_result.txt",sep="") , sep ="\t", row.names =TRUE, col.names =TRUE, quote =FALSE) ##write the sample result to a file 
  result[[sample]]=contamdf.prev$contaminant ##select the contamination columns, and add to the result dataframe
}
write.table(result, file="result_decontam.txt", sep ="\t", row.names =TRUE, col.names =TRUE, quote =FALSE)

wangqiqi1995 avatar Apr 06 '21 05:04 wangqiqi1995

I notice that prevalence method is based on chi-square method, then for my otu data , I would know how it work . I have two idea. One is chi contingency test . For example , for a species "IC002", calculate the reads number of sample ,control,all species of sample ,all species of control to contruct a contingency table as below and do contingency test . Species | sample | control IC002 | 6,914 | 10 all | 5061210 | 2356923 Another one is chi-square goodness of fit test ,calculate the reads number of sample, reads number of control divide by all reads number of control , reads number of all species of sample ,1 and do the test., as below Species | sample | control IC002 | 6,914 | 0.001366 all | 5061210 | 1 ############################# Do you think my idea is right? which one is better? In your tools - decontam, to take chi square test for a out table, is this done?

wangqiqi1995 avatar Apr 06 '21 06:04 wangqiqi1995

If I am reading your input data correctly, you have one negative control sample and two postive samples (Zymo mocks).

decontam isn't really intended to be used on such a small number of samples. The power to accurately call contaminants really requires larger sample numbers, in the paper we suggest 5 negative controls (or more), and at least as many real samples.

I notice that prevalence method is based on chi-square method, then for my otu data , I would know how it work . I have two idea. One is chi contingency test . For example , for a species "IC002", calculate the reads number of sample ,control,all species of sample ,all species of control to contruct a contingency table as below and do contingency test .

Note that the "prevalence" method creates a presence/absence contingency table. So the entires in the table are the number of samples in which a taxon was present, not overall abundance across all those samples.

benjjneb avatar Apr 07 '21 15:04 benjjneb

Thank you for all your assistance.

发自我的iPhone

------------------ Original ------------------ From: Benjamin Callahan @.> Date: Wed,Apr 7,2021 11:28 PM To: benjjneb/decontam @.> Cc: wangqiqi1995 @.>, Author @.> Subject: Re: [benjjneb/decontam] all FALSE (#90)

If I am reading your input data correctly, you have one negative control sample and two postive samples (Zymo mocks).

decontam isn't really intended to be used on such a small number of samples. The power to accurately call contaminants really requires larger sample numbers, in the paper we suggest 5 negative controls (or more), and at least as many real samples.

I notice that prevalence method is based on chi-square method, then for my otu data , I would know how it work . I have two idea. One is chi contingency test . For example , for a species "IC002", calculate the reads number of sample ,control,all species of sample ,all species of control to contruct a contingency table as below and do contingency test .

Note that the "prevalence" method creates a presence/absence contingency table. So the entires in the table are the number of samples in which a taxon was present, not overall abundance across all those samples.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

wangqiqi1995 avatar Apr 07 '21 15:04 wangqiqi1995