polars Able to add list of percentiles to df.describe()

Able to add list of percentiles to df.describe()

Open leoliu0 opened this issue 2 years ago • 2 comments

It would be nice to add a list of percentile to report in df.describe()

for example, df.describe([0.25, 0.75]) reports

describe str | "mean" "std" "min" "25%" "median" "75%" "max"

If this is the df.describe default would be great.

Aug 06 '22 04:08 leoliu0

Hello, thank you for this request.

Please note that describe is just a short-hand iterating over the aggregation functions: https://github.com/pola-rs/polars/blob/093d55c6b4bed6b76f3d814e7e66030dac3c4f87/py-polars/polars/internals/frame.py#L2484

You could thus simply create your own (note: not tested!):

def describe(self, qs: list[float} | None = None):
  def describe_cast(self: DF) -> DF:
              ...
  
  stats = [self.mean(), self.std() ...]
  labels = ["mean", "std", ...]
  if qs:
     stats += [self.quantile(q) for q in qs]
     labels += [f"quantile{q}" for q in qs]

  summary = self._from_pydf(
              pli.concat([describe_cast(st) for st in stats]
              )._df
          )
  summary.insert_at_idx(
              0, pli.Series("describe", labels)
          )
  return summary

Aug 06 '22 12:08 zundertj

Thanks! This works great as temporary fix.

Aug 07 '22 03:08 leoliu0

@zundertj @leoliu0

Could you please fill in the blanks in the above function on how to add percentiles?

Are there any plans to add the percentiles argument to the .describe() function? This would be super helpful!

Apr 05 '23 15:04 MariusMerkleQC

See the qs input argument, that is the percentiles as fractions:

df = pl.DataFrame(...)
describe(df, qs=[0.1, 0.25, 0.5, 0.75, 0.9])

Apr 07 '23 17:04 zundertj

+1 to adding this feature. Would be more consistent with pandas describe functionality.

Apr 07 '23 20:04 wangkev

polars polars copied to clipboard

Able to add list of percentiles to df.describe()

polars
polars copied to clipboard