seaborn
seaborn copied to clipboard
Compact beeswarm layout
I imagine that this may not fit in with the development priorities of Seaborn at the moment, but I'm opening the issue just in case.
I came up with an idea recently for a more compact layout algorithm for swarmplots that involves dynamically choosing which point to place next. I wondered if it might be useful as an alternative to Seaborn's swarmplot. I would be happy to have a go coding it up if you feel that it might be something you would like to have in Seaborn.
It's probably easiest to share this R beeswarm issue which has a couple of examples and briefly describes the approach. I've also done a more detailed write-up on Observable.
The main downside I can see of this new approach is that it's not trivial to have circles with different radii.
I won't be offended at all if you just close this issue, but I would be happy to discuss or try to implement it if you think it might be nice to include in Seaborn.
The visual appearance of your approach is nice. It would be helpful to have some quantification of the benefits/costs. How many more observations can your pack into the same horizontal space (say for gaussian/uniform/lognoral variables) with this approach? And how much slower would the algorithm be?
Thanks @mwaskom. I’ll try to get you some answers within the next week.
This really looks interesting. Something else that could be relevant, are "semi-discrete" data. Say, lengths of persons are measured in whole centimeters, while length is clearly a continuous concept. In a swarmplot (or a stripplot), the discreteness often pops up.
This is also related to a "jitter range" which is mentioned in e.g. the scatterplot documentation as "currently non-functional.".
Here is an example where the discreteness pops up, and probably could be mitigated via some jittering setting:
iris = sns.load_dataset('iris')
sns.swarmplot(data=iris, y='petal_width')
Does the new algorithm also take multiple hues into account? As in
sns.swarmplot(x=np.ones(200), y=np.random.randn(200), hue=np.tile([0, 1, 2, 3], 50), palette='turbo')
Jittering on a continuous axis is illegal. The quasi-discreteness of the petal widths is a true feature of the data and to jitter would be to obscure it. (It can be useful in obviously-discrete numeric contexts, though).
@jhncls It works with multiple hues.
@mwaskom I now have some numbers! It turned out quite differently from how I expected. The compact swarmplot isn't much more compact than the "classic" swarmplot, but it's faster.
I generated 450 instances: 150 each of normal, uniform and lognormal, with sizes in {100, 200, ..., 3000}. If we denote by width the horizontal distance between the leftmost and rightmost points, then the compact beeswarm is on average 3.6% narrower and 20% faster than the classic beeswarm.
These are the widths, with one point in the scatter plot for each instance:
These are the run times in seconds:
And this is the ratio of compact beeswarm width to classic beeswarm width, with the number of circles in the beeswarm shown on the x axis. (I should probably have used a swarmplot for this :-) )
I think one reason that the compact beeswarm makes so little difference to the width is that the classic beeswarm is a little more squashed along the value axis. My Python implementation of the compact beeswarm ensures that the centres of circles are at least 1.05 * 2 * r apart. My understanding of this line from Seaborn is that the swarmplot allows circle centres to be 2 * r apart along the value axis, but when stacking circles along the other axis it enforces a gap of 1.05 * 2 * r.
My code is here. I've just packed everything into a single function for now; it's probably not good style!
This has been a useful exercise for me, but I'll not be offended at all if you'd prefer not to add the extra lines of code to Seaborn.