mia icon indicating copy to clipboard operation
mia copied to clipboard

Importer for Biobakery outputs: HUMAnN 3

Open antagomir opened this issue 3 years ago • 9 comments

HUMAnN 3 provides functional predictions for metagenome profiles. An importer to MAE or altExp in mia would be useful as this is a common format.

Later in this page there is one import code example.

Some example data is on the way, for a closer look.

These are functional predictions based on metagenome profiles; they are not functional measurements (eg metabolites). Hence I am thinking that altExp might also be suitable since it is another view to the same data (metagenome) from which we pull taxonomic abundance profiles as well. Conceptually, MAE could be suitable since taxonomic and functional profiles are two different types, even if derived from the same source. I would tend to choose the latter (MAE).

antagomir avatar Dec 22 '21 22:12 antagomir

I have been thinking about this also. The HuMAnN3 output has one key and important aspect which is often underestimated. Example data: image

There is function-species linkage information that can be viewed in two ways: image

This is also the case with genome-resolved metagenomics where we have MAGs and pathway information for each of the MAGs across samples. So this is a general aspect which needs attention.

How can we store such information? feature-microbe joint in a single column is not always the best to analyse. This is more like single-cell data where pathway information for each microbe is available for every sample. Moreover, many pathways can be unique to specific microbes. But usually, we end up summing up pathways by samples thereby losing out on information about which microbes are contributing to these functions. In biological sense this is a crucial aspects considering high functional redundancy in microbiomes.
During my own analysis, for instance, I found pathways that are interesting and then looked at which microbes contributed to these pathways and found interesting patterns in bacterial contributions. I have been thinking about this but no eureka moment or maybe I am just overthinking here :P

microsud avatar Dec 23 '21 07:12 microsud

It is important, and we must learn while we go. I have not seen comprehensive R-based solutions to bring these levels together, and SE/MAE is a promising framework albeit not necessarily the final one. The MAE container does not require that features are matched. Additional information linking the features (rows), i.e. genes, pathways, taxa between MAE experiments is needed in many analyses and can be added through rowData, or in experiment metadata?

The sampleMap mechanism allows more complex matchings between colData and the individual experiments in MAE but for features this might be missing.

antagomir avatar Dec 23 '21 08:12 antagomir

This requires an additional class to be defined, if such a class is not available in BioC, since MAE links samples not features as @antagomir pointed out.

The requirements would be as follows:

  • To be compatible with MAEs it would need to extend from TSE
  • A hard-coded alternative TSE slot to hold the "mirror" data also as an TSE
  • A hard coded slot for linking data (also allowing for non-linked data?)
  • An invert function would need to be added to switch between species and gene data
  • A getter/setter pair for the alternative data slot
  • All the necessary reimplementation of functions from the TSE, SCE and SE universe (This is not hard, but probably a bit of work: Each call would need to be applied twice to data and the alternative data and the result recombined)

Downside would be, it would allow only two types of data mirroring each other and not like the MAE an huge number of data types.

However, I think this can be rationalized in this instance, since the number of samples have to be equal in both cases (This limitation is not imposed by MAE) and the type of data is very specific to microbiome data analysis. I would call the class MicrobiomeExperiment 😆 🤣

FelixErnst avatar Dec 31 '21 21:12 FelixErnst

Whoa! Well this could be useful and valuable. It is also some work. Let's see how we get there.. PRs welcome! :-)

Maybe one thing to still consider more carefully before jumping into it: if there are alternative (completely different?) solutions for operating in this space, or if the broader SE community is working on this already.

antagomir avatar Dec 31 '21 22:12 antagomir

Related to #383

antagomir avatar Jul 18 '23 22:07 antagomir

Also related to #306 #308

antagomir avatar Jul 18 '23 22:07 antagomir

Does mia::importHUMAnN() solve this one already (can we close)?

antagomir avatar Aug 05 '24 20:08 antagomir

It imports single Humann file into TreeSE. That might be the most optimal solution currently. The Humann output has species information that is stored in rowData, but single Humann/Metaphlan files are not linked if that was the idea

TuomasBorman avatar Aug 05 '24 20:08 TuomasBorman

Yes, two different issues:

  1. importing functional predictions with importHUMAnN() into TreeSE or similar
  2. linking taxonomic and functional data via MAE

It seems that we have solved (1) satisfactorily now.

The second issue remains open. Not sure if it is feasible to provide a general solution.

However, we could transfer the issue to OMA and demonstrate how to use MAE (or altExp might work even better as the samples match one-to-one) in linking the two types of data.

antagomir avatar Aug 05 '24 20:08 antagomir