seaborn icon indicating copy to clipboard operation
seaborn copied to clipboard

regplot: support confidence intervals with lowess model

Open jhamman opened this issue 9 years ago • 9 comments

Current regplot behavior when lowess=True is to ignore the confidence interval (ci) and bootstrap (n_boot) keyword arguments (example below):

sns.regplot('obs', 'mod', data=data, lowess=True, ci=95, n_boot=1000)

Any thoughts on making it possible to bootstrap the lowess model?

jhamman avatar May 11 '15 06:05 jhamman

Hello: First of all I would like to congratulate the authors on a wonderful library. May I ask, what your plans regarding this question is? I am using the most uptodate seaborn version through conda. Thanks again for a beautiful tool, Markus

metma99 avatar Dec 28 '17 23:12 metma99

Is there interest in adding the confidence intervals to the lowess fits? I find them informative when looking at data. In the past, I have used the skmisc.loess library to achieve this, but it would be nice to incorporate it into seaborn. After digging around in the issues, it seems like this hasn't been incorporated because of slow performance in boot strapping in statsmodels.

arkottke avatar Mar 24 '21 04:03 arkottke

Bootstrapping the loess fits was not performant enough in the testing that was done when this came up previously.

I would strongly prefer not to add another dependency, so the best path forward would be to incorporate the confidence interval code from that library into statsmodels.

Another possibility would be to add smooth regression using statsmodels GAMs rather than loess.

mwaskom avatar Mar 24 '21 11:03 mwaskom

That makes sense to me. I will look into those options and can open this up again if I find any useful.

arkottke avatar Mar 24 '21 15:03 arkottke

See also some related comments here: https://github.com/mwaskom/seaborn/issues/2351

Any path forward on lowess/smoothfit improvements will likely take the form of a new dedicated function that makes it easier to parameterize and use a different default approach to error bars ... too much packed into regplot at the moment.

mwaskom avatar Mar 24 '21 15:03 mwaskom

Is there any update on confidence intervals for loess or another smooth regression in regplot?

maoding avatar Apr 20 '23 08:04 maoding

Looking through the codebase it easy to see how to use the same tools to simply enable this bootstrap functionality.

import numpy as np
import seaborn as sns
from statsmodels.nonparametric.smoothers_lowess import lowess


def regplot_lowess_ci(data, x, y, ci_level, n_boot, **kwargs):
    x_ = data[x].to_numpy()
    y_ = data[y].to_numpy()
    x_grid = np.linspace(start=x_.min(), stop=x_.max(), num=1000)

    def reg_func(_x, _y):
        return lowess(exog=_x, endog=_y, xvals=x_grid)

    beta_boots = sns.algorithms.bootstrap(
        x_, y_,
        func=reg_func,
        n_boot=n_boot,
    )
    err_bands = sns.utils.ci(beta_boots, ci_level, axis=0)
    y_plt = reg_func(x_, y_)

    ax = sns.lineplot(x=x_grid, y=y_plt, **kwargs)
    sns.scatterplot(x=x_, y=y_, ax=ax, **kwargs)
    ax.fill_between(x_grid, *err_bands, alpha=.15, **kwargs)
    return ax


mpg_df = sns.load_dataset('mpg')
ax = regplot_lowess_ci(mpg_df, x='mpg', y='acceleration', ci_level=99, n_boot=100)
ax.figure.show()

image

Demetrio92 avatar Aug 07 '23 18:08 Demetrio92

Then, it can also be easily used with facet_grid to replicate lmplot

mpg_df.eval('heavy = weight>2803', inplace=True)
grid = sns.FacetGrid(mpg_df, col='heavy', sharex=False)
grid.map_dataframe(regplot_lowess_ci, x='mpg', y='acceleration', ci_level=99, n_boot=100)
grid.figure.show()

image

Demetrio92 avatar Aug 07 '23 18:08 Demetrio92

@mwaskom on my machine this runs in seconds. While sns.scatterplot regularly chokes on a few thousands of data points. I am not sure "users might be waiting for a long time for plotting to finish" is a good argument to disable ci option for lowess altogether.

Imo, a warning, or a default n_boot=100 if lowess==True would be a better solution.

I'd be willing to incorporate the above workaround into the actual codebase, e.g. adapt this function https://github.com/mwaskom/seaborn/blob/master/seaborn/regression.py#L296

And change this line to a warning or a defaults handler: https://github.com/mwaskom/seaborn/blob/master/seaborn/regression.py#L211

and submit a PR

Demetrio92 avatar Aug 07 '23 18:08 Demetrio92