empress icon indicating copy to clipboard operation
empress copied to clipboard

Enable stacked barplots for feature metadata

Open FranckLejzerowicz opened this issue 3 years ago • 3 comments

Hi,

I believe that it is currently not possible to make a stacked bar plots for groups of feature metadata variables. EMPress only considers one feature at a time, right? (e.g. one set of taxonomic levels, one set of differential values). However, some feature metadata variables might well be more insightful is presented stacked.

For example, if there is feature metadata available on the amount of say "biomolecule A" and "biomolecule B" produced by each microbe n a tree, one may desire to plot the amount of "biomolecules A and B" on the same, stacked barplot. I believe that the only way to achive this would be to create a dummy sample metadata file for this (where features would remain the rows, while ["biomolecule A", "biomolecule B"] would be specially-tailored columns). However, such solution could be too hacky, and, to my understanding, only one sample metadata can be passed.

Could it be a solution, for such barplots, to let the user select >1 category (using a check box instead of the dropdown), and in the background, EMPress would use the code to build as many "sample metadata" that there are selections (ie. barplots) and allow plotting multiple sample metadata information.

Also, note that the nice ability to represent the bar heights as a function of the feature metadata value (when continuous) - which I believe is also currently limited to one single feature - would take advantage of being available for multiple features, if these could be "interpreted" in EMPress as a samples in a sample metadata.

Sorry I have not looked at the code to elaborate such proposition - will do if time allows!

Thanks!

FranckLejzerowicz avatar Mar 25 '21 22:03 FranckLejzerowicz

Thank you for the suggestion! I think I understand your idea here -- this would involve allowing users to select multiple "quantitative" feature metadata categories, and then just directly plotting those as proportions in a stacked barplot?

Yeah, EMPress currently doesn't support that -- however, I think rearranging the code to do this should be doable without a ton of effort, since we already have code to draw these kinds of barplots.

To clarify a few points:

  1. Do you think it would be best if these stacked barplots all have the same "length" across each tip (like how sample metadata barplots currently work), or would it be better for you if the stacked barplots' lengths vary based on the total sum of the categories? (Or would both possibilities be meaningful for this application?) I think we could support both strategies. (Some examples below.)

    Constant length stacked barplots (Diet ring) Varying length stacked barplots

    Screenshot references described in #201.

  2. Do you anticipate there being lots of feature metadata categories to include at once in these barplots? (e.g. could there be, say, hundreds of biomolecule categories to include?) If so, we may want to explore ways of automatically creating these barplots based on a list of categories, to avoid making users manually click 100 checkboxes or something.

  3. Do you have a (ideally small) example dataset describing this? This would help with testing.

fedarko avatar Mar 30 '21 19:03 fedarko

Yes! A way to stack features metadata variables into a made-up-on-the-fly sample metadata could certainly do the trick for the purpose of one barplot.

Now for your questions, I think that:

  1. both possibilities be meaningful (if the feature metadata variables to stack are expressed in the same scale, then the total bar lengths would make sense)
  2. I have thought about this and I guess checkboxes would do as the number of variables to check in for a barplot should be low. Indeed, making a barplot with too many stacks would be pretty unclear, not to mention the colors recycling issue. Now - IMO - this is already a problem with the current barplots: lot of clicking (allowing a config file as in iTOL would actually be great, but that is another, significant "feature request").
  3. I can send you the fibers food tree: c.a. 180 tips, and its feature metadata are just four columns. Where shall I send this?

Thanks

FranckLejzerowicz avatar Mar 31 '21 00:03 FranckLejzerowicz

Thanks for the explanation! This helps a lot.

Now - IMO - this is already a problem with the current barplots: lot of clicking (allowing a config file as in iTOL would actually be great, but that is another, significant "feature request").

I think this will be possible when #131 is addressed in the future -- once using config files to save/load app state is possible, it should be pretty straightforward for us to automatically generate barplot configurations (in cases where people have tons of fancy barplot configurations they'd like to set up).

I can send you the fibers food tree: c.a. 180 tips, and its feature metadata are just four columns. Where shall I send this?

I guess my UCSD email works fine. I probably won't have the free time to work on this for some time, but it would be really great to support this.

fedarko avatar Apr 01 '21 03:04 fedarko