seaborn icon indicating copy to clipboard operation
seaborn copied to clipboard

Compact beeswarm layout

Open jtrim-ons opened this issue 3 years ago • 6 comments

I imagine that this may not fit in with the development priorities of Seaborn at the moment, but I'm opening the issue just in case.

I came up with an idea recently for a more compact layout algorithm for swarmplots that involves dynamically choosing which point to place next. I wondered if it might be useful as an alternative to Seaborn's swarmplot. I would be happy to have a go coding it up if you feel that it might be something you would like to have in Seaborn.

It's probably easiest to share this R beeswarm issue which has a couple of examples and briefly describes the approach. I've also done a more detailed write-up on Observable.

The main downside I can see of this new approach is that it's not trivial to have circles with different radii.

I won't be offended at all if you just close this issue, but I would be happy to discuss or try to implement it if you think it might be nice to include in Seaborn.

jtrim-ons avatar Apr 16 '21 12:04 jtrim-ons

The visual appearance of your approach is nice. It would be helpful to have some quantification of the benefits/costs. How many more observations can your pack into the same horizontal space (say for gaussian/uniform/lognoral variables) with this approach? And how much slower would the algorithm be?

mwaskom avatar Apr 16 '21 13:04 mwaskom

Thanks @mwaskom. I’ll try to get you some answers within the next week.

jamestrimble avatar Apr 16 '21 14:04 jamestrimble

This really looks interesting. Something else that could be relevant, are "semi-discrete" data. Say, lengths of persons are measured in whole centimeters, while length is clearly a continuous concept. In a swarmplot (or a stripplot), the discreteness often pops up.

This is also related to a "jitter range" which is mentioned in e.g. the scatterplot documentation as "currently non-functional.".

Here is an example where the discreteness pops up, and probably could be mitigated via some jittering setting:

iris = sns.load_dataset('iris')
sns.swarmplot(data=iris, y='petal_width')

Does the new algorithm also take multiple hues into account? As in

sns.swarmplot(x=np.ones(200), y=np.random.randn(200), hue=np.tile([0, 1, 2, 3], 50), palette='turbo')

jhncls avatar Apr 16 '21 16:04 jhncls

Jittering on a continuous axis is illegal. The quasi-discreteness of the petal widths is a true feature of the data and to jitter would be to obscure it. (It can be useful in obviously-discrete numeric contexts, though).

mwaskom avatar Apr 16 '21 17:04 mwaskom

@jhncls It works with multiple hues.

jtrim-ons avatar Apr 19 '21 19:04 jtrim-ons

@mwaskom I now have some numbers! It turned out quite differently from how I expected. The compact swarmplot isn't much more compact than the "classic" swarmplot, but it's faster.

I generated 450 instances: 150 each of normal, uniform and lognormal, with sizes in {100, 200, ..., 3000}. If we denote by width the horizontal distance between the leftmost and rightmost points, then the compact beeswarm is on average 3.6% narrower and 20% faster than the classic beeswarm.

These are the widths, with one point in the scatter plot for each instance:

plot-width-scatter

These are the run times in seconds:

time-scatter

And this is the ratio of compact beeswarm width to classic beeswarm width, with the number of circles in the beeswarm shown on the x axis. (I should probably have used a swarmplot for this :-) )

width-ratio

I think one reason that the compact beeswarm makes so little difference to the width is that the classic beeswarm is a little more squashed along the value axis. My Python implementation of the compact beeswarm ensures that the centres of circles are at least 1.05 * 2 * r apart. My understanding of this line from Seaborn is that the swarmplot allows circle centres to be 2 * r apart along the value axis, but when stacking circles along the other axis it enforces a gap of 1.05 * 2 * r.

My code is here. I've just packed everything into a single function for now; it's probably not good style!

This has been a useful exercise for me, but I'll not be offended at all if you'd prefer not to add the extra lines of code to Seaborn.

jtrim-ons avatar Apr 19 '21 20:04 jtrim-ons