mia icon indicating copy to clipboard operation
mia copied to clipboard

Add tree collapse to subsetByPrevalentTaxa and other related methods

Open antagomir opened this issue 3 years ago • 3 comments

Feature subsetting methods include at least subsetByRareTaxa and subsetByPrevalentTaxa.

The problem is that these do not collapse the rowTree. That has to be done separately:

# Add subset by leaf here
library(mia)
library(philr)
library(tidyr)
data(GlobalPatterns, package="mia")
## Select prevalent taxa 
tse <-  GlobalPatterns %>% subsetByPrevalentTaxa(
          detection = 0, prevalence = 90/100, as_relative=FALSE)
# Collapse the tree
tree <- ape::keep.tip(phy = rowTree(tse), tip = rowLinks(tse)$nodeNum)
rowTree(tse) <- tree

# This gives:
#> length(rowTree(tse)$node.label)
#[1] 226
#> length(rowTree(GlobalPatterns)$node.label)
#[1] 19215

However, tree collapsing is available through TreeSummarizedExperiment::subsetByLeaf, if one does subsetting by specifying taxa.

# Alternative way
taxa <- getPrevalentTaxa(GlobalPatterns, detection = 0, prevalence = 90/100)
tse2 <- subsetByLeaf(x = GlobalPatterns, rowLeaf = taxa)
# This gives:
#> length(rowTree(tse2)$node.label)
#[1] 226

It would be convenient if the tree collapse step could be directly included in the dedicated subsetting functions such as subsetByPrevalentTaxa and subsetByRareTaxa.

Both of them use the tse[ , ] mechanism for subsetting internally. Ideally, that could be modified into something like tse[ , , collapseTree=TRUE]. If this is not feasible, one could use TreeSummarizedExperiment::subsetByLeaf to allow tree collapse within those functions.

If we do not want to allow tree collapse as part of these subsetting functions, then I propose to remove these subsetting functions entirely, and point users to doing this in two steps: 1) identify the feature subset; 2) do subsetting (and optionally collapse the tree). As in the latter example above.

antagomir avatar Jan 01 '22 13:01 antagomir

Good suggestion, but the wrong package. The [ is part of the TreeSummarizedExperiment package.

However, this subject was brought up before: https://github.com/fionarhuang/TreeSummarizedExperiment/issues/34

edit: The summary of the discussion in that thread: Mutliple rows can be linked to a leaf. Therefore, automatic trimming is not really an option

If we do not want to allow tree collapse as part of these subsetting functions, then I propose to remove these subsetting functions entirely, and point users to doing this in two steps: 1) identify the feature subset; 2) do subsetting (and optionally collapse the tree). As in the latter example above.

We discussed this at some point on slack and you convinced me add some of the subset functions as a legacy function matching phyloseq functionality. I am happy to remove them.

FelixErnst avatar Jan 01 '22 17:01 FelixErnst

How about providing an option to collapse the tree (default FALSE)? I assume that this is possible in many practical cases. And when it is not, the function could provide error and an informative message that informs the user that it is not possible to collapse the tree due to the above mentioned reason? Or I could suggest to add this to TreeSE [ operation.

antagomir avatar Jan 05 '22 15:01 antagomir

@TuomasBorman it seems to me that this has already been dealt with in the current updated methods?

antagomir avatar Jul 24 '24 08:07 antagomir