decontam icon indicating copy to clipboard operation
decontam copied to clipboard

p.freq differs when using method="frequency" and method="auto/combined/minimum/either/both"

Open zz77zz opened this issue 1 year ago • 1 comments

Hi, I noticed that the values calculated for p.freq differ when I am running isContaminant with the "freq" method only or with a method that also calculates p.prev (combined, minimum, either, both). This is reproducible with the vignette toy dataset. It looks like a bug, is there any reason to expect that behaviour?

library(phyloseq)
library(ggplot2)
library(decontam)

ps <- readRDS(system.file("extdata", "MUClite.rds", package = "decontam"))
sample_data(ps)$is.neg <- sample_data(ps)$Sample_or_Control == "Control Sample"

contamdf.freq <- isContaminant(ps, method = "frequency", conc = "quant_reading")
contamdf.prev <- isContaminant(ps, method = "prevalence", neg = "is.neg")
contamdf.both <- isContaminant(ps, method = "auto", conc = "quant_reading", neg = "is.neg")

## check if we have obtained the same "p..." values
contamdf.both$OTUID <- rownames(contamdf.both)
contamdf.freq$OTUID <- rownames(contamdf.freq)
contamdf.prev$OTUID <- rownames(contamdf.prev)
test <- merge(contamdf.both, contamdf.freq, by = "OTUID")
test <- merge(test, contamdf.prev, by = "OTUID")

## p.prev are the same
ggplot(test, aes(x = p.prev, y = p.prev.x)) +
  geom_abline(intercept = 0, slope = 1) +
  geom_point()

## some p.freq differ
ggplot(test, aes(x = p.freq.x, y = p.freq.y)) +
  geom_abline(intercept = 0, slope = 1) +
  geom_point()

zz77zz avatar Dec 08 '22 01:12 zz77zz

There is a poorly documented, but intentional, behavior by decontam that when using "combined" methods (or other methods that use both frequency and prevalence) that the negative controls are excluded from the frequency score calculations.

benjjneb avatar Dec 14 '22 03:12 benjjneb