mplsoccer
mplsoccer copied to clipboard
Add a scatter density chart method
Add a scatter density method to the Pitch classes.
Inspiration:
- https://twitter.com/etmckinley/status/1169256582145703937
- https://github.com/LKremer/ggpointdensity
In principle, this could be added by using the c
and cmap
arguments of Axes.scatter. Calculating the density via kernel density estimation and using this as the c
argument.
I've carried out a bit more investigation as to what happens when you pass in a very large number of points into #34
For n<20,000 it will render within a couple of seconds on my mid-range 2019 laptop, so the user experience is fine in those situations. After that however things degrade quite rapidly, with n = 150,000 taking several minutes which is probably unacceptable.
Therefore I've investigated the following alternatives from https://stackoverflow.com/questions/20105364/how-can-i-make-a-scatter-plot-colored-by-density-in-matplotlib
Option 1. mpl-scatter-density
Pros:
- Lightning fast, can easily handle millions of points
Cons:
- passing in a cmap seems to be fiddly as locations with no points are still filled in so to get around that you define your own cmap with white/transparent at zero
- I had to play around with the dpi setting to get nice looking results when n is lower
Option 2. np.histogram2d
Pros:
- Still very fast, can plot 150,000 points in under a second
- Can pass in any cmap easily
- Looks good with both high and low number of points
- Built-in method in https://stackoverflow.com/a/53865762/3015186 implementation for sorting by density for improved visual appearance
- No additional library required as uses numpy and scipy
Cons:
- Nothing major
With this in mind I propose to resubmit #34 with option 2 implemented which gives significant speed improvement and the option to plot the highest density points last so they are on top.