seaborn
seaborn copied to clipboard
Palette does not support the use of defaultdict with missing values
Currently, Seaborn does not permit the use of defaultdict with missing values as a palette. A minimal example that reproduces this issue is:
import seaborn as sns
import pandas as pd
from collections import defaultdict
data = pd.DataFrame({
"values": [1, 2, 3],
"hues": ["foo", "bar", "baz"],
})
palette = defaultdict(lambda: "#000000", {
"foo": "#ff0000",
"bar": "#00ff00",
})
sns.histplot(
x="values",
data=data,
hue="hues",
palette=palette,
)
My expectation is that this should use the default value of #000000
for baz
, which is missing from the palette. Instead, this raises an exception:
Traceback (most recent call last):
File "/home/ehermes/test/seaborn_defaultdict.py", line 15, in <module>
sns.histplot(
File "/home/ehermes/venvs/seaborn/lib/python3.10/site-packages/seaborn/distributions.py", line 1384, in histplot
p.map_hue(palette=palette, order=hue_order, norm=hue_norm)
File "/home/ehermes/venvs/seaborn/lib/python3.10/site-packages/seaborn/_base.py", line 838, in map_hue
mapping = HueMapping(self, palette, order, norm, saturation)
File "/home/ehermes/venvs/seaborn/lib/python3.10/site-packages/seaborn/_base.py", line 150, in __init__
levels, lookup_table = self.categorical_mapping(
File "/home/ehermes/venvs/seaborn/lib/python3.10/site-packages/seaborn/_base.py", line 234, in categorical_mapping
raise ValueError(err.format(missing))
ValueError: The palette dictionary is missing keys: {'baz'}
For this test, I have used seaborn-0.13.2
and matplotlib-3.8.2
.
I have a fix for this problem in a personal branch (https://github.com/ehermes/seaborn/tree/palette_defaultdict), but per your contribution guidelines, I have opened a bug report first. With permission, I can also create a PR for my fix.
defaultdict is a nice pythonic solution here, but the type signature for palette
is already quite complicated and i'm fairly averse to expanding it further. I'm not also not convinced that setting up the defaultdict is that much more convenient than defining a full dict palette based on the data, e.g. something like
palette = {
*{x: "k" for x in data["hues"].unique()},
"foo": "#ff0000",
"bar": "#00ff00",
}
Is the same LoC and avoids an import.
This is a good solution if you have the data that you will be plotting when you are first creating the palette. In our application, the palette is "statically" defined in a library, and the data we plot is generated at runtime. Sometimes the data contains entries that we did not expect to be present at the time we wrote the library, so we need to have a backup value present. My current workaround to this issue is to essentially do what you're suggesting, but I have to do it in every single function that creates a seaborn plot, which is a lot of redundant code. We could possibly simplify things through a code re-org, but my preference would be for seaborn to use the defaultdict
that we have chosen for this exact reason in the expected manner.
Why say “in this expected manner”? Defaultdict is not a subtype of dict and seaborn’s docs don’t suggest that it will be accepted.
Strictly speaking, defaultdict is a subtype of dict:
In [1]: from collections import defaultdict
In [2]: palette = defaultdict(lambda: "#000000", {
...: "foo": "#ff0000",
...: "bar": "#00ff00",
...: })
In [3]: isinstance(palette, dict)
Out[3]: True
When I say "in the expected manner", I mean from the "duck typing" perspective: a defaultdict
behaves like a dict
, and thus should be suitable for any application in which a dict
is accepted. The only reason we cannot use a defaultdict
as the palette for seaborn is because of an extra check that every level has a corresponding key in it, which may not be true for non-primitive dict
-likes. Actually, this brings to mind an alternative possible solution, which doesn't specifically require reference to defaultdict
:
if isinstance(palette, dict):
missing = set()
for level in levels:
try:
palette[level]
except KeyError:
missing.add(level)
if any(missing):
err = "The palette dictionary is missing keys: {}"
raise ValueError(err.format(missing))
Edit: Removed non-functional alternate suggestions (apparently defaultdict.get
doesn't behave the way I thought it did)
In any case, my point is that the current check is preventing us from using something as the palette which we would otherwise be able to, and which we currently do use for our other non-matplotlib plots (namely plotly). The changes I have suggested here would add more flexibility to the code without impacting the functionality of the missing key check, when users are passing a standard dict
.