seaborn icon indicating copy to clipboard operation
seaborn copied to clipboard

categorical plots - unused categories mess up element spacing and width

Open Gabriel-Kissin opened this issue 7 months ago • 4 comments

Several of seaborn's functions for plotting categorical data don't cope well when the categories list includes unused categories.

I've noticed two main issues:

  1. element width shrinks
  2. element spacing doesn't match the x-axis.

It doesn't make a difference if you use vertical or horizontal orientation.

The issue only occurs when the same feature is used for the categorical x/y variable and for the hue. If no hue is provided, or if the hue uses a different feature, there is no issue.

The issues occur for sns.barplot, sns.boxplot, sns.boxenplot, sns.violinplot. Whereas sns.pointplot, sns.stripplot, sns.swarmplot are fine.

I've reproduced the issue with the penguins dataset we all know and love from the seaborn docs. In the following MRE, the first col is the raw penguins data. The second col is after converting it to categorical (also works fine). The final col is after adding an unused category to the data, which causes the above two issues:

image

It looks as though it is failing to recognise that the hue and y are the same, so it makes space on the plot within each y for all the hues. This is what makes each element a) get squeezed, and b) not align nicely with the y ticks. Presumably the unused category is somehow the cause of the confusion.

Code to generate the above plot:

import matplotlib.pyplot as plt
import seaborn as sns

penguins = sns.load_dataset("penguins")

plotters = [sns.barplot, sns.boxplot, sns.boxenplot, sns.violinplot, 
            sns.pointplot, sns.stripplot, sns.swarmplot]

# with horizontal orientation
fig, axs = plt.subplots(ncols=3, nrows=len(plotters), figsize=(16, 3*len(plotters)), sharex=True, sharey=False)
kwargs = dict(data=penguins, x="body_mass_g", y="island", hue="island", legend=False,)

# If no hue is provided, or if the hue uses a different feature, there is no issue.
# kwargs = dict(data=penguins, x="body_mass_g", y="island", hue="sex", legend=True,)
# kwargs = dict(data=penguins, x="body_mass_g", y="island", legend=False,)

# same issue with vertical orientation
# fig, axs = plt.subplots(ncols=3, nrows=len(plotters), figsize=(16, 3*len(plotters)), sharex=False, sharey=True)
# kwargs = dict(data=penguins, x="island", y="body_mass_g", hue="island", legend=False,)

for i, plotter in enumerate(plotters):

    axs[i, 1].set_title(plotter.__name__)

    plotter(ax=axs[i, 0], **kwargs)

    cat_cols = penguins.select_dtypes('O').columns
    penguins[cat_cols] = penguins[cat_cols].astype('category')
    plotter(ax=axs[i, 1], **kwargs)

    penguins["island"] = penguins["island"].cat.add_categories(['Uninhabited Island '])
    plotter(ax=axs[i, 2], **kwargs)
    penguins["island"] = penguins["island"].cat.remove_unused_categories()


plt.tight_layout()
plt.show()

Many thanks as always for the superb library!

Gabriel-Kissin avatar Jul 23 '24 14:07 Gabriel-Kissin