OMA icon indicating copy to clipboard operation
OMA copied to clipboard

agglomerateByRank enhancement

Open antagomir opened this issue 1 year ago • 0 comments

Is your feature request related to a problem? Please describe.

The agglomerateByRank function only accepts ranks that are listed in TAXONOMY_RANKS of the mia package. This prevents agglomeration to levels such as ASV / OTU or other non-standard taxonomy rank, although there are frequent needs to do so. One example is the ANCOMBC::ancombc2 function, where the tax_level argument cannot be used for ASV-level calculations (a common task) directly (see ANCOM issue #174). It would be often handy to allow operations also on non-standard taxonomic levels.

Describe the solution you'd like

When the grouping variable can be interpreted as taxonomic level, one would call agglomerateByRank and that function takes into account also the hierarchy in the rows and rowData (taxonomic table, taxonomic tree). This could be ideally done with the same agglomerateByRank function. Currently the function has the safeguard of only accepting known taxonomic levels.

Agglomeration takes into account the hierarchy of the taxonomy, therefore it is necessary to define all taxonomic ranks for this function. I.e. if "OTU" would be added, then it is necessary to show the entire hierarchy TAXONOMY_RANKS <- c(TAXONOMY_RANKS, "OTU").

Suggested solution: make it possible for the user to specify their own taxonomic ranks either via system-wide definition of TAXONOMY_RANKS, or via function argument. Also improve the examples.

Describe alternatives you've considered

  1. Hard-code new standard taxonomy ranks (e.g. OTU / ASV) in TAXONOMY_RANKS; this will still not allow general application of this function for non-standard tax levels and is therefore not an ideal solution.
  2. Switch off the requirement of using TAXONOMY_RANKS in agglomerateByRank; this is not possible since the agglomeration uses the full hierarchy and not individual fields;
  3. Use generic function like mergeRows; this is not possible since the agglomeration uses the full hierarchy and not individual fields;

Additional context

The users should be able to make the final choice, therefore the current implementation of agglomerateByRank is too restrictive.

Other functions should be checked for similar restrictions.

antagomir avatar Jul 11 '23 22:07 antagomir