seaborn icon indicating copy to clipboard operation
seaborn copied to clipboard

`so.Hist`: Document and propagate `weight` parameter

Open sschuldenzucker opened this issue 1 year ago • 1 comments

so.Hist() has a currently-undocumented weight parameter that can be used to make a weighted histogram like so:

df = sns.load_dataset('titanic')
df = df[['age', 'fare']].dropna()  # can't deal well with NaNs. This is not the bug reported!

# Total fare collected for each age group:
(
    so.Plot(df, x='age')
    .add(so.Bars(), so.Hist(bins=10), weight='fare')
)

This is useful but:

  • It's currently undocumented. It should be mentioned in the documentation of so.Hist.
  • weight currently has to be provided like above, i.e., as a parameter to .add(). It can't be provided as...
    • a parameter to so.Hist() --- This may or may not be right, I don't know the design ideology so well.
    • a parameter to so.Plot() --- This looks like it should be possible based on the design ideology?

sschuldenzucker avatar Jan 26 '23 00:01 sschuldenzucker

a parameter to so.Hist()

This wouldn't make sense unless you want to use the same weight for all values (when would you want that).

a parameter to so.Plot()

The Plot constructor accepts mappable properties of the mark, but weights is different as it's only used in the stat computation. Plot needs to "know" about mappable properties (so that it can set up scales, etc.), and so it can expose them in its constructor. But that's not true for stat properties.

The flip side is that, in principle, a third-party Stat could accept and use a variable passed in its layer without anything else in the system needing to know about it. The upshot is that stat variables need to be passed at the level of a specific layer.

mwaskom avatar Feb 02 '23 00:02 mwaskom