ridgeplot
ridgeplot copied to clipboard
Use a different KDE implementation
We have experienced some issues with statsmodels' KDE implementation (see in-line comments in ridgeplot._kde.estimate_density_trace()
.
- statsmodels uses scipy under the hood. This could be a good alternative to investigate. pandas.Series.plot.kde also uses scipy.stats.gaussian_kde
- Alternatively, https://github.com/LBL-EESA/fastkde could also be an option
- Python 3.13 will ship a
kde()
utility as part of the built-instatistics
module:- https://docs.python.org/3.13/library/statistics.html#statistics.kde
- https://docs.python.org/3.13/library/statistics.html#sampling-from-kernel-density-estimation
- https://github.com/python/cpython/pull/115863
- https://github.com/python/cpython/issues/115532
Things to keep in mind:
- Backwards compatibility with the existing
ridgeplot()
arguments that are passed to statsmodels'KDEUnivariate
- Performance. e.g., statsmodels provides a faster FFT implementation when using the gaussian kernel.
- ...more?
Design-wise, would it not be cleanest to:
- separate plotting from density estimation a-priori
- leave some configurability which method to use?
I wanted to write some abstract density estimation intefaces anyway, for skpro
(though ofc no need to use them - just saying that I have done some thinking around that topic).