empress
empress copied to clipboard
Support only coloring by the most frequent values in a field
I think this would mainly be useful for categorical feature metadata tree coloring / barplots, but it may also be useful for sample metadata barplots. Basically, this would have the user specify some k (let's say k = 10), and then this would only assign the 10 most common values colors and then map all other values to "Other" or something. Would be a nice way of making plots with massive amounts of unique values more interpretable (e.g. when coloring by species / genus in Empress, it's usually really hard to see what's going on because there are just so many different species / genera present even in moderately-sized 16S datasets).
I don't think this would be (easily) applicable to sample metadata tree coloring or animations, since those work by figuring out unique portions of the tree to certain groups -- and subsuming things into an "Other" group would probably cause a bunch of misleading "Other" blobs to get highlighted on the tree.
Idea (and "k" notation :) from @gibsramen.
It would also be nice to allow the user to specify certain values that should be excluded (and always included in Other
) -- for example, s__
, Unspecified
, etc.