seaborn icon indicating copy to clipboard operation
seaborn copied to clipboard

seaborn.objects `so.Stack()` incompatible with `so.Range()` (errorbar)

Open respatte opened this issue 2 years ago • 3 comments

I've come across a situation where I had to use so.Stack() to represent some data with a barplot where so.Dodge() was definitely not an option. As I am using so.Agg() to show the mean for different groups, I wanted to add error bars, but it seems that so.Stack() does not affect so.Range() and instead puts the error bar where it would be if no stacking took place. Here's a reproducible example using the penguin dataset:

import seaborn as sns
import seaborn.objects as so

penguins = sns.load_dataset("penguins")

(
    so.Plot(penguins, x="species", y="body_mass_g", color="sex")
    .add(so.Bar(), so.Agg(), so.Stack())
    .add(so.Range(), so.Est(errorbar="sd"), so.Stack())
    .show()
)

myplot

It might just be that so.Stack() is less developed at the moment as it is probably not used as much, but it is a shame that its behaviour is different from that of so.Dodge(). In the future, it might even be nice to be able to use so.Dodge() and so.Stack() on different dimensions of the data with the same syntax already used in so.Dodge(by=...).

respatte avatar Apr 17 '23 10:04 respatte

so.Stack does not affect ymin and ymax, which set the parameters for the Range. There is actually a TODO in the code about doing that, but I suspect it raises some issues. For instance, I'd expect you would use Stack if the quantities actually sum to something meaningful (e.g. a total number of customers, split by gender) ; in that case it is a little bit weird to have error bars for each part : what happens with the total error ? Also, I guess there may be use cases where you actually don't want to use it for error bars but for something else, where this behaviour would not be correct ? Anyway, I guess you would be better off writing your own object for now.

thuiop avatar Apr 18 '23 14:04 thuiop

That makes a lot of sense, I think it would be a great addition to allow for the error bars to be computed either for each stacked group or for the whole combined stack, those two cases certainly seem sensible and would probably cover most situations in which people want to plot error bars. Not something very urgent though as it is not exactly a very common situation!

respatte avatar Apr 19 '23 12:04 respatte

Yeah i think the right behavior here is a little but undefined.

Statistically, it feels like the error bars on stacked plots should accumulate error, but that's probably not what most people want or expect. Personally I am pretty negatively inclined towards stacking + error bars but one design principle of the objects interface is "let people do weird stuff that doesn't make sense to avoid blocking weird stuff that does" so I guess it should be supported :)

With that said, it should be fairly straightforward for Stack to adjust ymin and ymax by the same amount it adjusts each row's y. Supporting error accumulation is probably out of scope and something you'd need to plug in a custom Move for. It would also be necessary to figure out where best to handle a "default baseline" for marks where the concept of a baseline does not make sense. It may be the case that this only works for "compound marks" (#3120).

mwaskom avatar Apr 25 '23 23:04 mwaskom