proplot
proplot copied to clipboard
Add KDE functionality to hist and hist2d plots
I'd like to add KDE (kernel density estimation) functionality for the 1D and 2D histogram plotting functions, hist
, hist2d
, and maybe hexbin
. Users can then optionally add marginal distribution panels with panel_axes
.
Currently, the only matplotlib plotting function supporting KDE estimation is violinplot
, but the result is often gross -- the "violins" do not smoothly taper to zero-width tails like in seaborn. Instead they abruptly cut off at the distribution minimum/maximum. So, we shouldn't try to use the existing KDE engine -- we should implement a new KDE estimation engine, similar to seaborn, and use it to power hist
, hist2d
, and violinplot
. This may involve writing a new violinplot
from scratch.
It's a very nice feature! I can hardly wait!
I made some custom KDE graphs for a publication recently:
These type of graphs are called 'raincloudplots': https://wellcomeopenresearch.org/articles/4-63 (which has a python implementation but is based on seaborn and therefore has the same problems) The code for the KDE part of the graph is here: https://github.com/Jhsmit/PyHDX-paper/blob/master/biorxiv_v2/functions/rainbows.py
Feel free to use the code in proplot
if you find it useful (although some parts are from joyplot). I'm using scipy's kde function which mostly works fine but especially for the 2D case it can be slow if you have a lot of datapoints.
I might want to try to find time to make a PR myself, I'm a big fan of proplot. I've started using it for my last publication and the subplot layout and sizing options in proplot really made my life a lot easier :) (paper / code)
I'll try to provide some feedback to proplot if that helps you. PS. perhaps you could consider connecting your repository to zenodo such that the project can be cited.
Thanks for the code! This is a good base for adding KDE functionality -- I probably won't have time to work on this until later this year but happy to accept PRs if you feel inclined/want it sooner. Proplot's source code recently underwent some major improvements so it should be much easier to contribute.
We probably want to add the following:
- Add a shared helper function at the top of
axes/plot.py
that controls KDE estimation for various plotting functions. Users should be able to pass keyword arguments to the KDE algorithm from the plotting functions. - Add a
violinplot
option to plot a left- or right-half violin (like in your example), maybe with the argumentside='left'
andside='right'
(orside='top'
orside='bottom'
for horizontal violins), withside='both'
being the default. - Rewrite
violinplot
to use your method for KDE estimation rather than matplotlib's method. It would probably simply callfill_between
orfill_betweenx
and then you can add outlines to the violins like you would any other patch. It would still be able to add error bars/boxes using the sharedPlotAxes._apply_bar
method. - Add a
raincloudplot
method (with the shorthandraincloud
, consistent with other plotting commands) as a thin wrapper that callsboxplot
,violinplot
, andscatter
. It would callboxplot
andviolinplot
with reduced defaultwidths
arguments and defaultside='left'
orside='top'
for the violins. - Make
violinplot
have no colormap gradations by default, but let users add them by passingcmap='name'
toviolinplot
orraincloudplot
(it should also acceptvmin
andvmax
arguments, but set the defaultvmin
andvmax
to the minimum and maximum of all the distributions). To implement colormap gradations,violinplot
will set the facecolor of the patch to'none'
(i.e., completely transparent) so that animshow
can be drawn underneath the patch border and "clipped" by the border coordinates, as you've done in your code. - Add
kdeplot
andkdeplot2d
commands (with shorthandskde
andkde2d
, consistent with other functions) that show KDE estimations using lines and contours (respectively). They should be thin wrappers aroundplot
andcontour
/contourf
, similar to howhist
andhist2d
are thin wrappers aroundbar
andpcolor
. - Add the ability to pass
kde=True
tohist
andhist2d
and this will draw thekde
andkde2d
lines on top of the histograms, analogous to the current ability of passinglinewidth=N
tocontourf
and proplot adds an additionalcontour
plot on top of the filled contours. KDE-algorithm or KDE-styling keywords could be passed tohist
andhist2d
withkde_kw={key: value, ...}
, analogous to various other arguments ending in_kw
. - Update the user guide with lots of examples! By the time all of these features are added we'd probably need a separate "Statistical plotting" section separate from the current "1d plotting" and "2d plotting" sections.
And glad you find proplot useful :) it's already published on Zenodo but that probably wasn't clear -- there was just a Zenodo badge to the github home page. I've now added a link to the readthedocs homepage.