pandas icon indicating copy to clipboard operation
pandas copied to clipboard

ENH: Allow to plot weighted KDEs.

Open LucaMingarelli opened this issue 1 year ago • 1 comments

Feature Type

  • [X] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

The current implementation does not currently allow to plot weighted KDEs.

Feature Description

Estimation of the PDF is currently done via scipy.stats.gaussian_kde which allows for a parameter weights. pandas.DataFrame.plot.kde should accept this parameter as well.

Alternative Solutions

Here allow to pass a parameter weights to scipy.stats.gaussian_kde.

Additional Context

No response

LucaMingarelli avatar Jun 24 '24 22:06 LucaMingarelli

Hello, I am working on it.

fbourgey avatar Jun 28 '24 12:06 fbourgey

I updated the following

https://github.com/fbourgey/pandas/blob/feature-plot-weighted-kde/pandas/plotting/_core.py#L1449 https://github.com/fbourgey/pandas/blob/feature-plot-weighted-kde/pandas/plotting/_matplotlib/hist.py#L266

The code works.

Should we add one example in the function kde with the parameter weights? Does this function need to be updated as well?

fbourgey avatar Jul 24 '24 18:07 fbourgey

The following code gives

s = pd.Series([1, 2, 2.5, 3, 3.5, 4, 5])
ax = s.plot.kde()

Figure_0

Replacing with some weights produces

weights = pd.Series([0.1, 0.0, 0.0, 0.2, 0.3, 0.4, 0.9])
ax = s.plot.kde(weights=weights)

Figure_1

Using a Numpy Array works as well

weights = np.array([0.1, 0.4, 0.0, 0.2, 0.3, 0.4, 0.2])

However, passing a list instead

weights = [0.1, 0.4, 0.0, 0.2, 0.3, 0.4, 0.2]

raises the following error

  File "/Users/florianbourgey/projects/misc/pandas_gaussian_kde.py", line 7, in <module>
    ax = s.plot.kde(weights=weights)
  File "/Users/florianbourgey/projects/pandas/pandas/plotting/_core.py", line 1567, in kde
    return self(kind="kde", bw_method=bw_method, weights=weights, ind=ind, **kwargs)
  File "/Users/florianbourgey/projects/pandas/pandas/plotting/_core.py", line 1049, in __call__
    return plot_backend.plot(data, kind=kind, **kwargs)
  File "/Users/florianbourgey/projects/pandas/pandas/plotting/_matplotlib/__init__.py", line 71, in plot
    plot_obj.generate()
  File "/Users/florianbourgey/projects/pandas/pandas/plotting/_matplotlib/core.py", line 500, in generate
    self._make_plot(fig)
  File "/Users/florianbourgey/projects/pandas/pandas/plotting/_matplotlib/hist.py", line 168, in _make_plot
    kwds["weights"] = type(self)._get_column_weights(self.weights, i, y)
  File "/Users/florianbourgey/projects/pandas/pandas/plotting/_matplotlib/hist.py", line 202, in _get_column_weights
    weights = weights[~isna(y)]

fbourgey avatar Jul 26 '24 20:07 fbourgey