ridgeplot icon indicating copy to clipboard operation
ridgeplot copied to clipboard

Use a different KDE implementation

Open tpvasconcelos opened this issue 1 year ago • 1 comments

We have experienced some issues with statsmodels' KDE implementation (see in-line comments in ridgeplot._kde.estimate_density_trace().

  • statsmodels uses scipy under the hood. This could be a good alternative to investigate. pandas.Series.plot.kde also uses scipy.stats.gaussian_kde
  • Alternatively, https://github.com/LBL-EESA/fastkde could also be an option
  • Python 3.13 will ship a kde() utility as part of the built-in statistics module:
    • https://docs.python.org/3.13/library/statistics.html#statistics.kde
    • https://docs.python.org/3.13/library/statistics.html#sampling-from-kernel-density-estimation
    • https://github.com/python/cpython/pull/115863
    • https://github.com/python/cpython/issues/115532

Things to keep in mind:

  • Backwards compatibility with the existing ridgeplot() arguments that are passed to statsmodels' KDEUnivariate
  • Performance. e.g., statsmodels provides a faster FFT implementation when using the gaussian kernel.
  • ...more?

tpvasconcelos avatar Jun 22 '23 16:06 tpvasconcelos

Design-wise, would it not be cleanest to:

  • separate plotting from density estimation a-priori
  • leave some configurability which method to use?

I wanted to write some abstract density estimation intefaces anyway, for skpro (though ofc no need to use them - just saying that I have done some thinking around that topic).

fkiraly avatar Feb 12 '24 00:02 fkiraly