seaborn
seaborn copied to clipboard
Add a parameter that cleans up axes labels and legend titles?
DataFrame column labels usually have lowercased, underscored names. These get automatically added to seaborn plots. But wouldn't it be nice if we could transform these into spaced, title-cased labels? Python string processing makes this pretty easy to do. It could be a single parameter (what's a good name?), although the logic would need to live in a couple of different places until axis labeling and legend titling are fully centralized.
I would like this feature as well, and I'm surprised it doesn't exist. I think an implementation might have a parameter called label_mapping, which maps the column names in the data frame to nice human-readable names that will be displayed on the axes and legend. For instance something like this:
label_mapping = {
'cname': 'Class Name',
'rt_area': 'sqrt(area)',
}
ax = sns.histplot(data, x='rt_area', hue='cname', label_mapping=label_mapping)
Current minimal workaround is as follows:
text = ax.get_ylabel()
ax.set_ylabel(label_mapping.get(text, text))
text = ax.get_xlabel()
ax.set_xlabel(label_mapping.get(text, text))
title = ax.get_legend().get_title()
text = title.get_text()
title.set_text(label_mapping.get(text, text))
Hm, that's not exactly what I have in mind here ... there's a distinction between simple rules for automatically cleaning up the texts and adding a different API for polishing the figure ... seaborn tries to avoid the later, generally speaking.
BTW a generalized ~3 liner given a label_mapping dict would look something like:
import seaborn as sns
import matplotlib as mpl
penguins = sns.load_dataset("penguins")
ax = sns.histplot(data=penguins, x="bill_length_mm", hue="species")
label_map = {
"bill_length_mm": "Bill length (mm)",
"species": "Species",
}
for text_obj in ax.findobj(mpl.text.Text):
if text := label_map.get(text_obj.get_text(), ""):
text_obj.set_text(text)
Thanks, that's a better snippet than mine.
I'd argue that the first case falls into the later case of polishing the figure. I think an API to add simple cleanup rules would likely add about as much complexity as an API for polishing in terms of developer brain-space.
The former case seems like its just trying to be a weak --- or automatic --- version of the latter. Its a useful heuristic, but it would obviously fail in certain cases (like rt_area -> Rt Area). It might also cause confusion if people saw that seaborn had an API for minor cleanup to a figure, but lacked the ability to fully customized it.
However, I'm not so sure a polish API is such a bad thing, it just needs to be carefully designed. I think the current palette argument is a great example of it. Seaborn automatically chooses colors very well, but sometimes a customer says: can we make that blue? And then you have to make it blue.
I think such a rule based approach and a customized approach could coexist via the same parameter (similar to the way other seaborn arguments work). Perhaps a rough first draft is label_polish=None, which does nothing by default, if label_polish == 'auto', then perform the caps based rule, and if label_polish is a dictionary, then map the column names to the "nice" names. Or maybe if its a function it can be some custom rule that's called on each text object.
Not sure if my name is the best name, and given the complexity adding more parameters adds to the API, and the relative ease of postprocessing the plot, I'm not convinced having either such approach is a correct. But on the flipside, I do love how seaborn commands let me make near perfect publication quality plots by customizing arguments to a single function call, so maybe a polish API might be nice.
Acronyms are a pretty good argument against the original proposal here, and probably strong enough to kill it.
Your proposal is interesting, however, though not without arguments against.
It would be a pain to add a new parameter to every function, and I think my example shows that this logic could be handled through a polishing function that is independent of the main seaborn logic. So I am not sure it is wise to add it to the plotting functions. But there is also not much precedence for "small utilities functions" in seaborn. Originally there were a couple, but I've actively tried to avoid adding more and to deprecate the ones that exist.
The main cost would be in cognitive load: there's an existing matplotlib API for polishing figure labels (i.e., changing them), and people would have to learn when to use the new "auto polishing" tools, which will necessarily be limited in some ways. I guess the question is in what cases it would be really useful to have a dictionary based mapping, when most people's workflow of calling seaborn and then calling set(xlabel="A nicer label", ylabel="This gets a label too") seems to work pretty well and is probably less verbose for a couple of pltos. I guess once you have a dictionary that exists and a simple parameter or function call to apply it, you're doing less typing once you're polishing ~10 plots. But then are you really in need of "polishing"?
Alternatively, it could be proposed to matplotlib as a new Axes method. Not sure if they'd go for it.
I agree cognitive load is a huge cost. The argspec of seaborn functions is already pretty big, and I can only see more "polish-style" arguments being necessary if the first one is integrated. At this point I'm convinced that adding a polish API to seaborn plotting functions is not the correct thing to do.
Thoughts on a secondary polish API
However, I do think the use case of "having human readable names on the axes" is relevant and worth addressing. I'm interested in how some sort of postprocessing function might fit into seaborn.
Using your snippet in my code I ended up going with this:
def polish(ax, label_mapping=None):
import matplotlib as mpl
if label_mapping:
for text_obj in ax.findobj(mpl.text.Text):
text = text_obj.get_text()
text = label_mapping.get(text, '')
if text:
text_obj.set_text(text)
return ax
and called it like this:
ax = sns.histplot(
data, x=key, hue='cname', palette=catname_to_colors,
**kw
)
polish(ax, label_mapping)
And of course the polish function could become more complex where it performed the "auto" behavior described above, but I went with the minimal thing for now. I actually like the idea of having something like seaborn.polish.
Also it opens possibilities like using function-composition to create seaborn plots that do have a polish API similar to the original idea:
def polish_decor(plot_func):
import inspect
argspec = inspect.getargspec(polish)
polish_keys = argspec.args[-len(argspec.defaults):]
def _wrapped_plot(*args, **kwargs):
polish_kw = {
k: kwargs.pop(k)
for k in polish_keys if k in kwargs
}
ax = plot_func(*args, **kwargs)
ax = polish(ax, **polish_kw)
return ax
return _wrapped_plot
@polish_decor
def histplot(*args):
...
# OR
my_histplot = polish_decor(sns.histplot)
# Or even something as dirty as this where the user monkey patches
# seaborn to allow this behavior
import inspect
for attrname in dir(sns):
if not attrname.startswith('_'):
attr = getattr(sns, attrname)
if callable(attr):
# Some criterion to filter to only the plot functions
sig = inspect.signature(attr)
if 'data' in sig.parameters:
wrapped = polish_decor(attr)
setattr(sns, attrname, wrapped)
I wouldn't advocate for the above code going into seaborn, I'm just noting how a secondary function call could be wrapped into a single call for uses who were into that sort of thing.
The need for polishing
But then are you really in need of "polishing"?
Yes, I think there are many cases where you are.
Imagine the case where you are autogenerating 10+ plots (which I often do as part of reports) and then sending those to a customer. Typically you want to hide the scary code details to avoid frightening the customers, so at least polishing the labels (and potentially more things), might be a useful extension to seaborn. Of course I could define that polish function myself, but I do think there is a good argument for its home being in seaborn rather than in user code. I think the main question that needs to be answered is: what else might need polishing other than label names? Colors are already taken care of by palette. ATM I can't think of other plot elements that postprocessing would help with, but I'm sure they exist.
Label polishing as an extension to pandas?
Alternatively, I wonder if the "nice" human readable labels have any place in pandas? Pandas has automatic plotting, and if you were able to register "nice" names with a DataFrame, then seaborn could leverage that without any extra API. I know you can currently remap the names of a DataFrame, but I personally don't like that solution as it creates a copy and prevents you from using the more code-friendly keys in subsequent code.
To your point on matplotlib, I'm not sure if the pandas team would go for this either. But if others find that this is a compelling idea seaborn could implement an experimental feature that looks for the _label_mapping attribute in a dataframe. But personally, I prefer the secondary polish call.