empress icon indicating copy to clipboard operation
empress copied to clipboard

Support only coloring by the most frequent values in a field

Open fedarko opened this issue 3 years ago • 1 comments

I think this would mainly be useful for categorical feature metadata tree coloring / barplots, but it may also be useful for sample metadata barplots. Basically, this would have the user specify some k (let's say k = 10), and then this would only assign the 10 most common values colors and then map all other values to "Other" or something. Would be a nice way of making plots with massive amounts of unique values more interpretable (e.g. when coloring by species / genus in Empress, it's usually really hard to see what's going on because there are just so many different species / genera present even in moderately-sized 16S datasets).

I don't think this would be (easily) applicable to sample metadata tree coloring or animations, since those work by figuring out unique portions of the tree to certain groups -- and subsuming things into an "Other" group would probably cause a bunch of misleading "Other" blobs to get highlighted on the tree.

Idea (and "k" notation :) from @gibsramen.

fedarko avatar Sep 03 '20 18:09 fedarko

It would also be nice to allow the user to specify certain values that should be excluded (and always included in Other) -- for example, s__, Unspecified, etc.

fedarko avatar Oct 01 '20 23:10 fedarko