phyloseq
phyloseq copied to clipboard
Scoping issues in subset_taxa (and probably other similar functions)
The function subset_taxa
(and probably other similar functions that use ellipses to pass subset expressions) can only handle expressions passed where all referenced objects are in the global scope, so (for example) trying to subset a phyloseq
object inside of a function where you've defined the criteria won't work. See this simple example:
library(phyloseq)
data(GlobalPatterns)
# doesn't work, because `phyla` isn't in global scope
do_subset <- function(ps) {
phyla <- c('Crenarchaeota','Euryarchaeota','Planctomycetes')
subset_taxa(ps,Phylum %in% phyla)
}
do_subset(GlobalPatterns)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'table' in selecting a method for function '%in%': object 'phyla' not found
# works because `phyla` is now in global scope, but the subset_ call
# uses the global version rather than the version in the function scope
phyla <- c('Nitrospirae','Gemmatimonadetes','Fusobacteria')
thing <- do_subset(GlobalPatterns)
f <- as.data.frame(tax_table(thing))
# this contains the phyla from the global `phyla` object
unique(f$Phylum)
#> [1] "Fusobacteria" "Gemmatimonadetes" "Nitrospirae"
I encountered this because I was trying to use the purrr
map functions to take differing subsets of a ps object and do stuff with that, like this:
phyla_subsets <- list(a=c('Crenarchaeota','Euryarchaeota','Planctomycetes'),b=c('Nitrospirae','Gemmatimonadetes','Fusobacteria'))
phyla_subsets %>%
map(~{
ps <- GlobalPatterns %>%
subset_taxa(Phylum %in% .x)
# ... now do something with ps ...
})
#> Error in phyla_subsets %>% map(~{: could not find function "%>%"
I can set .x
to something global using the <<-
operator, but that feels very kludgy and may affect other things down the line
It's a persistent issue that is unfortunately not resolved. Both subset_samples() and subset_taxa() (and maybe other functions as well) have a scoping issue and do not recognise function variables.
The workaround is to either use the <<-
operator or assign the variables before calling the function. e.g.:
data(GlobalPatterns)
sample_type = "Ocean"
subset_function <- function(physeq_obj = GlobalPatterns, sample_type ){
subset_samples(GlobalPatterns, SampleType==sample_type)
}
subset_function(GlobalPatterns, sample_type = "Ocean")