seaborn
seaborn copied to clipboard
categorical plots - unused categories mess up element spacing and width
Several of seaborn's functions for plotting categorical data don't cope well when the categories list includes unused categories.
I've noticed two main issues:
- element width shrinks
- element spacing doesn't match the x-axis.
It doesn't make a difference if you use vertical or horizontal orientation.
The issue only occurs when the same feature is used for the categorical x/y variable and for the hue. If no hue is provided, or if the hue uses a different feature, there is no issue.
The issues occur for sns.barplot
, sns.boxplot
, sns.boxenplot
, sns.violinplot
. Whereas sns.pointplot
, sns.stripplot
, sns.swarmplot
are fine.
I've reproduced the issue with the penguins dataset we all know and love from the seaborn docs. In the following MRE, the first col is the raw penguins data. The second col is after converting it to categorical (also works fine). The final col is after adding an unused category to the data, which causes the above two issues:
It looks as though it is failing to recognise that the hue
and y
are the same, so it makes space on the plot within each y
for all the hue
s. This is what makes each element a) get squeezed, and b) not align nicely with the y
ticks. Presumably the unused category is somehow the cause of the confusion.
Code to generate the above plot:
import matplotlib.pyplot as plt
import seaborn as sns
penguins = sns.load_dataset("penguins")
plotters = [sns.barplot, sns.boxplot, sns.boxenplot, sns.violinplot,
sns.pointplot, sns.stripplot, sns.swarmplot]
# with horizontal orientation
fig, axs = plt.subplots(ncols=3, nrows=len(plotters), figsize=(16, 3*len(plotters)), sharex=True, sharey=False)
kwargs = dict(data=penguins, x="body_mass_g", y="island", hue="island", legend=False,)
# If no hue is provided, or if the hue uses a different feature, there is no issue.
# kwargs = dict(data=penguins, x="body_mass_g", y="island", hue="sex", legend=True,)
# kwargs = dict(data=penguins, x="body_mass_g", y="island", legend=False,)
# same issue with vertical orientation
# fig, axs = plt.subplots(ncols=3, nrows=len(plotters), figsize=(16, 3*len(plotters)), sharex=False, sharey=True)
# kwargs = dict(data=penguins, x="island", y="body_mass_g", hue="island", legend=False,)
for i, plotter in enumerate(plotters):
axs[i, 1].set_title(plotter.__name__)
plotter(ax=axs[i, 0], **kwargs)
cat_cols = penguins.select_dtypes('O').columns
penguins[cat_cols] = penguins[cat_cols].astype('category')
plotter(ax=axs[i, 1], **kwargs)
penguins["island"] = penguins["island"].cat.add_categories(['Uninhabited Island '])
plotter(ax=axs[i, 2], **kwargs)
penguins["island"] = penguins["island"].cat.remove_unused_categories()
plt.tight_layout()
plt.show()
Many thanks as always for the superb library!