mia
mia copied to clipboard
Add tree collapse to subsetByPrevalentTaxa and other related methods
Feature subsetting methods include at least subsetByRareTaxa
and subsetByPrevalentTaxa
.
The problem is that these do not collapse the rowTree
. That has to be done separately:
# Add subset by leaf here
library(mia)
library(philr)
library(tidyr)
data(GlobalPatterns, package="mia")
## Select prevalent taxa
tse <- GlobalPatterns %>% subsetByPrevalentTaxa(
detection = 0, prevalence = 90/100, as_relative=FALSE)
# Collapse the tree
tree <- ape::keep.tip(phy = rowTree(tse), tip = rowLinks(tse)$nodeNum)
rowTree(tse) <- tree
# This gives:
#> length(rowTree(tse)$node.label)
#[1] 226
#> length(rowTree(GlobalPatterns)$node.label)
#[1] 19215
However, tree collapsing is available through TreeSummarizedExperiment::subsetByLeaf
, if one does subsetting by specifying taxa.
# Alternative way
taxa <- getPrevalentTaxa(GlobalPatterns, detection = 0, prevalence = 90/100)
tse2 <- subsetByLeaf(x = GlobalPatterns, rowLeaf = taxa)
# This gives:
#> length(rowTree(tse2)$node.label)
#[1] 226
It would be convenient if the tree collapse step could be directly included in the dedicated subsetting functions such as subsetByPrevalentTaxa
and subsetByRareTaxa
.
Both of them use the tse[ , ]
mechanism for subsetting internally. Ideally, that could be modified into something like tse[ , , collapseTree=TRUE]
. If this is not feasible, one could use TreeSummarizedExperiment::subsetByLeaf
to allow tree collapse within those functions.
If we do not want to allow tree collapse as part of these subsetting functions, then I propose to remove these subsetting functions entirely, and point users to doing this in two steps: 1) identify the feature subset; 2) do subsetting (and optionally collapse the tree). As in the latter example above.
Good suggestion, but the wrong package. The [
is part of the TreeSummarizedExperiment
package.
However, this subject was brought up before: https://github.com/fionarhuang/TreeSummarizedExperiment/issues/34
edit: The summary of the discussion in that thread: Mutliple rows can be linked to a leaf. Therefore, automatic trimming is not really an option
If we do not want to allow tree collapse as part of these subsetting functions, then I propose to remove these subsetting functions entirely, and point users to doing this in two steps: 1) identify the feature subset; 2) do subsetting (and optionally collapse the tree). As in the latter example above.
We discussed this at some point on slack and you convinced me add some of the subset functions as a legacy function matching phyloseq functionality. I am happy to remove them.
How about providing an option to collapse the tree (default FALSE)? I assume that this is possible in many practical cases. And when it is not, the function could provide error and an informative message that informs the user that it is not possible to collapse the tree due to the above mentioned reason? Or I could suggest to add this to TreeSE [
operation.
@TuomasBorman it seems to me that this has already been dealt with in the current updated methods?