polars icon indicating copy to clipboard operation
polars copied to clipboard

Add standard error over mean function `sem()` for GroupBy

Open rk-exxec opened this issue 2 years ago • 0 comments

Problem description

I wish I could use Polars to calculate the standard error over the mean like in Pandas https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.sem.html

I need to calculate lots of confidence intervals and a sem() function would decrease the complexity of my scripts a lot.

Example:

columns = pl.exclude(["count", "sqrt"])
mean_df = (
   data
   .with_columns(pl.count().over(groupby))
   .with_columns([
      pl.col("count").sqrt().alias("sqrt"),
   ])
   .groupby(groupby)
   .agg([
      columns.mean(),
      columns.std().suffix("_SDEV"),
      (columns.std() / pl.col("sqrt")).first().suffix("_SEM_68"),
   ])
)

would become

mean_df = (
   data
   .groupby(groupby)
   .agg([
      pl.all().mean(),
      pl.all().std().suffix("_SDEV"),
      pl.all().sem().first().suffix("_SEM_68"),
   ])
)

rk-exxec avatar Jan 11 '23 14:01 rk-exxec