seaborn icon indicating copy to clipboard operation
seaborn copied to clipboard

`scatterplot` assigns incorrect dot size when `size` has only two unique values (0 and another number)

Open 136s opened this issue 1 year ago • 1 comments

Description

When using sns.scatterplot with the size parameter, if the unique values in size are only 0 and one other number, the size assigned to 0 is unexpectedly large. This issue does not occur when the size list includes more than two unique nonzero values.

Steps to reproduce

import sys
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

print(f"Python: {sys.version}")
print(f"matplotlib: {plt.matplotlib.__version__}")
print(f"pandas: {pd.__version__}")
print(f"seaborn: {sns.__version__}")


def plot_dot(val: list):
    _, ax = plt.subplots(figsize=(2, 2))
    data = pd.DataFrame({"X": ["x1", "x2", "x1", "x2"], "Y": ["y1", "y1", "y2", "y2"], "size": val})
    g = sns.scatterplot(data=data, x="X", y="Y", size="size", ax=ax)
    g.legend(loc="upper left", bbox_to_anchor=(1.05, 1.0))
    g.title.set_text(f"size={val}")


plot_dot([0, 0, 1, 1])  # 0 appears abnormally large
plot_dot([None, 0, 1, 1])  # 0 appears abnormally large
plot_dot([0, 0, 1, 2])  # 0 appears correctly
plot_dot([1, 1, 2, 2])  # non zero values appears correctly

Observed behavior

Python: 3.13.2 | packaged by conda-forge | (main, Feb 17 2025, 14:02:48) [Clang 18.1.8 ]
matplotlib: 3.10.1
numpy: 2.2.4
pandas: 2.2.3
seaborn: 0.13.2

Image Image Image Image

Expected behavior

  • The dots for 0 should appear small and not disproportionately large, regardless of whether size contains only two unique values.

Additional information

  • This issue persists even when explicitly setting sizes=(10, 200).
  • The problem does not occur when there are more than two unique values in size.
  • A workaround is to include additional unique values in size, but this should not be necessary.

Would appreciate any insights on whether this is an intended behavior or a bug in size scaling. Thanks!

136s avatar Mar 19 '25 06:03 136s

Hi, this is intended behavior and not a bug in size scaling. If the size vector has only two unique values (such as [0,1]), Seaborn treats these as categorical data. It assigns the first unique value the largest dot and the second unique value the smallest dot. If the size vector has three or more unique numbers, Seaborn treats them as numeric data: the smallest number gets the smallest dot, and the largest number gets the largest dot.

Therefore, if you have only two unique values and want the smallest dot assigned to 0 (instead of 1), you should plot the dots of size 1 first, followed by the dots of size 0 to align with Seaborn's default assignment. Also, the legend may not be consistent with the scatter plot if you reverse the 0s and 1s when there are only two unique values. If you want full control over the legend, you should create a custom legend.

phoebecd avatar May 18 '25 01:05 phoebecd