mia
mia copied to clipboard
Importer for Biobakery outputs: HUMAnN 3
HUMAnN 3 provides functional predictions for metagenome profiles. An importer to MAE
or altExp
in mia would be useful as this is a common format.
Later in this page there is one import code example.
Some example data is on the way, for a closer look.
These are functional predictions based on metagenome profiles; they are not functional measurements (eg metabolites). Hence I am thinking that altExp
might also be suitable since it is another view to the same data (metagenome) from which we pull taxonomic abundance profiles as well. Conceptually, MAE
could be suitable since taxonomic and functional profiles are two different types, even if derived from the same source. I would tend to choose the latter (MAE
).
I have been thinking about this also. The HuMAnN3 output has one key and important aspect which is often underestimated.
Example data:
There is function-species linkage information that can be viewed in two ways:
This is also the case with genome-resolved metagenomics where we have MAGs and pathway information for each of the MAGs across samples. So this is a general aspect which needs attention.
How can we store such information? feature-microbe joint in a single column is not always the best to analyse.
This is more like single-cell data where pathway information for each microbe is available for every sample. Moreover, many pathways can be unique to specific microbes. But usually, we end up summing up pathways by samples thereby losing out on information about which microbes are contributing to these functions. In biological sense this is a crucial aspects considering high functional redundancy in microbiomes.
During my own analysis, for instance, I found pathways that are interesting and then looked at which microbes contributed to these pathways and found interesting patterns in bacterial contributions. I have been thinking about this but no eureka moment or maybe I am just overthinking here :P
It is important, and we must learn while we go. I have not seen comprehensive R-based solutions to bring these levels together, and SE/MAE is a promising framework albeit not necessarily the final one. The MAE
container does not require that features are matched. Additional information linking the features (rows), i.e. genes, pathways, taxa between MAE
experiments is needed in many analyses and can be added through rowData
, or in experiment metadata
?
The sampleMap mechanism allows more complex matchings between colData
and the individual experiments in MAE
but for features this might be missing.
This requires an additional class to be defined, if such a class is not available in BioC, since MAE links samples not features as @antagomir pointed out.
The requirements would be as follows:
- To be compatible with MAEs it would need to extend from TSE
- A hard-coded alternative TSE slot to hold the "mirror" data also as an TSE
- A hard coded slot for linking data (also allowing for non-linked data?)
- An invert function would need to be added to switch between species and gene data
- A getter/setter pair for the alternative data slot
- All the necessary reimplementation of functions from the TSE, SCE and SE universe (This is not hard, but probably a bit of work: Each call would need to be applied twice to data and the alternative data and the result recombined)
Downside would be, it would allow only two types of data mirroring each other and not like the MAE an huge number of data types.
However, I think this can be rationalized in this instance, since the number of samples have to be equal in both cases (This limitation is not imposed by MAE) and the type of data is very specific to microbiome data analysis. I would call the class MicrobiomeExperiment
😆 🤣
Whoa! Well this could be useful and valuable. It is also some work. Let's see how we get there.. PRs welcome! :-)
Maybe one thing to still consider more carefully before jumping into it: if there are alternative (completely different?) solutions for operating in this space, or if the broader SE
community is working on this already.
Related to #383
Also related to #306 #308
Does mia::importHUMAnN() solve this one already (can we close)?
It imports single Humann file into TreeSE. That might be the most optimal solution currently. The Humann output has species information that is stored in rowData, but single Humann/Metaphlan files are not linked if that was the idea
Yes, two different issues:
- importing functional predictions with importHUMAnN() into TreeSE or similar
- linking taxonomic and functional data via MAE
It seems that we have solved (1) satisfactorily now.
The second issue remains open. Not sure if it is feasible to provide a general solution.
However, we could transfer the issue to OMA and demonstrate how to use MAE (or altExp might work even better as the samples match one-to-one) in linking the two types of data.