`scatterplot` assigns incorrect dot size when `size` has only two unique values (0 and another number)
Description
When using sns.scatterplot with the size parameter, if the unique values in size are only 0 and one other number, the size assigned to 0 is unexpectedly large. This issue does not occur when the size list includes more than two unique nonzero values.
Steps to reproduce
import sys
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
print(f"Python: {sys.version}")
print(f"matplotlib: {plt.matplotlib.__version__}")
print(f"pandas: {pd.__version__}")
print(f"seaborn: {sns.__version__}")
def plot_dot(val: list):
_, ax = plt.subplots(figsize=(2, 2))
data = pd.DataFrame({"X": ["x1", "x2", "x1", "x2"], "Y": ["y1", "y1", "y2", "y2"], "size": val})
g = sns.scatterplot(data=data, x="X", y="Y", size="size", ax=ax)
g.legend(loc="upper left", bbox_to_anchor=(1.05, 1.0))
g.title.set_text(f"size={val}")
plot_dot([0, 0, 1, 1]) # 0 appears abnormally large
plot_dot([None, 0, 1, 1]) # 0 appears abnormally large
plot_dot([0, 0, 1, 2]) # 0 appears correctly
plot_dot([1, 1, 2, 2]) # non zero values appears correctly
Observed behavior
Python: 3.13.2 | packaged by conda-forge | (main, Feb 17 2025, 14:02:48) [Clang 18.1.8 ]
matplotlib: 3.10.1
numpy: 2.2.4
pandas: 2.2.3
seaborn: 0.13.2
Expected behavior
- The dots for
0should appear small and not disproportionately large, regardless of whethersizecontains only two unique values.
Additional information
- This issue persists even when explicitly setting
sizes=(10, 200). - The problem does not occur when there are more than two unique values in
size. - A workaround is to include additional unique values in
size, but this should not be necessary.
Would appreciate any insights on whether this is an intended behavior or a bug in size scaling. Thanks!
Hi, this is intended behavior and not a bug in size scaling. If the size vector has only two unique values (such as [0,1]), Seaborn treats these as categorical data. It assigns the first unique value the largest dot and the second unique value the smallest dot. If the size vector has three or more unique numbers, Seaborn treats them as numeric data: the smallest number gets the smallest dot, and the largest number gets the largest dot.
Therefore, if you have only two unique values and want the smallest dot assigned to 0 (instead of 1), you should plot the dots of size 1 first, followed by the dots of size 0 to align with Seaborn's default assignment. Also, the legend may not be consistent with the scatter plot if you reverse the 0s and 1s when there are only two unique values. If you want full control over the legend, you should create a custom legend.