seaborn icon indicating copy to clipboard operation
seaborn copied to clipboard

Feature request: expand seaborn API to expose artists and other information

Open normanius opened this issue 3 years ago • 1 comments

Seaborn is capable of producing very neat plots out of the box. Nevertheless, "post-production" editing of the resulting plots is often required to improve readability, add information, or match the aesthetics to those of target medium.

There are already plenty of possibilities to modify the aesthetics: Styles/themes in seaborn or matplotlib, parameters to control aesthetics in figure- or axis-level plots (palette, hue_order, kwargs forwarded to plt function. etc...), and possibly more...

The issue at hand, however, addresses data-aware, artist-level adjustments, which are currently poorly supported by the current API. In particular, I thought of the following use cases:

  • visually emphasize or annotate particular axis artists
  • grouping-aware adjustments of aesthetics (see here for an example)

In principle, the above could be achieved by searching and modifying the artists created for an axis (see for instance: tutorial on artists; and Axes.get_children(), Axes.get_lines(). However, this is unsatisfactory as one ends up reverse-engineering parts of the internal logic within seaborn functions, which is error-prone and may change across different package versions.

Instead, seaborn could expose artist information in a structured way. For instance, one could extend the API functions by an optional container argument info:

artists = {}
tips = sns.load_dataset("tips")
sns.boxplot(x="day", y="total_bill", hue="smoker", data=tips, artists=artists)

Alternatively, one could work with data classes, which might be better suited for a package API (and facilitates documentation).

How exactly the information is structured certainly requires some further analysis and discussion. It is conceivable to use this info container to also expose other information such as aggregated data, interpolation or estimated distribution parameters. This would be useful

  • to avoid duplicated computations
  • for debugging

normanius avatar Apr 28 '21 14:04 normanius

I would like to have something like this. I have typically figured it would take the form of seaborn adding an attribute onto the matplotlib axes where it stores references to the artists it added. The tricky bit has always been what the right data structure is so that this is actually useful. There may be an open issue with more of my thoughts; I'm not sure. It's come up before.

Passing in a mutable object that then gets populated with the artists in a similar way is not a bad idea altogether, although a lot of users find pythons pass-by-reference semantics confusing, so it might be underused. Also what happens when you pass the same dictionary to two different function calls?

Modifying the artists that matplotlib stores is the current recommended power-user solution here. I agree it's not ideal because it requires some knowledge of what seaborn is doing internally. But in my experience these post-hoc modifications typically come up only when you're doing some sort of advanced figure polishing.

Nevertheless, "post-production" editing of the resulting plots is often required to improve readability, add information, or match the aesthetics to those of target medium.

I guess I would contest that it is "often required", but people have different workflows. Between rcparams and artist kwargs, it should be rare that you need to modify the artists post-hoc just for aesthetic or readability reasons. Also, if you're not happy with what seaborn does and need to do a lot of post-processing, it might be easier to just drop down to matplotlib directly.

mwaskom avatar May 01 '21 22:05 mwaskom