decontam
decontam copied to clipboard
p.freq differs when using method="frequency" and method="auto/combined/minimum/either/both"
Hi, I noticed that the values calculated for p.freq differ when I am running isContaminant with the "freq" method only or with a method that also calculates p.prev (combined, minimum, either, both). This is reproducible with the vignette toy dataset. It looks like a bug, is there any reason to expect that behaviour?
library(phyloseq)
library(ggplot2)
library(decontam)
ps <- readRDS(system.file("extdata", "MUClite.rds", package = "decontam"))
sample_data(ps)$is.neg <- sample_data(ps)$Sample_or_Control == "Control Sample"
contamdf.freq <- isContaminant(ps, method = "frequency", conc = "quant_reading")
contamdf.prev <- isContaminant(ps, method = "prevalence", neg = "is.neg")
contamdf.both <- isContaminant(ps, method = "auto", conc = "quant_reading", neg = "is.neg")
## check if we have obtained the same "p..." values
contamdf.both$OTUID <- rownames(contamdf.both)
contamdf.freq$OTUID <- rownames(contamdf.freq)
contamdf.prev$OTUID <- rownames(contamdf.prev)
test <- merge(contamdf.both, contamdf.freq, by = "OTUID")
test <- merge(test, contamdf.prev, by = "OTUID")
## p.prev are the same
ggplot(test, aes(x = p.prev, y = p.prev.x)) +
geom_abline(intercept = 0, slope = 1) +
geom_point()
## some p.freq differ
ggplot(test, aes(x = p.freq.x, y = p.freq.y)) +
geom_abline(intercept = 0, slope = 1) +
geom_point()
There is a poorly documented, but intentional, behavior by decontam that when using "combined" methods (or other methods that use both frequency and prevalence) that the negative controls are excluded from the frequency score calculations.