polars icon indicating copy to clipboard operation
polars copied to clipboard

Able to add list of percentiles to df.describe()

Open leoliu0 opened this issue 1 year ago • 2 comments

It would be nice to add a list of percentile to report in df.describe()

for example, df.describe([0.25, 0.75]) reports

describe str | "mean" "std" "min" "25%" "median" "75%" "max"

If this is the df.describe default would be great.

leoliu0 avatar Aug 06 '22 04:08 leoliu0

Hello, thank you for this request.

Please note that describe is just a short-hand iterating over the aggregation functions: https://github.com/pola-rs/polars/blob/093d55c6b4bed6b76f3d814e7e66030dac3c4f87/py-polars/polars/internals/frame.py#L2484

You could thus simply create your own (note: not tested!):

def describe(self, qs: list[float} | None = None):
  def describe_cast(self: DF) -> DF:
              ...
  
  stats = [self.mean(), self.std() ...]
  labels = ["mean", "std", ...]
  if qs:
     stats += [self.quantile(q) for q in qs]
     labels += [f"quantile{q}" for q in qs]

  summary = self._from_pydf(
              pli.concat([describe_cast(st) for st in stats]
              )._df
          )
  summary.insert_at_idx(
              0, pli.Series("describe", labels)
          )
  return summary

zundertj avatar Aug 06 '22 12:08 zundertj

Thanks! This works great as temporary fix.

leoliu0 avatar Aug 07 '22 03:08 leoliu0

@zundertj @leoliu0

Could you please fill in the blanks in the above function on how to add percentiles?

Are there any plans to add the percentiles argument to the .describe() function? This would be super helpful!

MariusMerkleQC avatar Apr 05 '23 15:04 MariusMerkleQC

See the qs input argument, that is the percentiles as fractions:

df = pl.DataFrame(...)
describe(df, qs=[0.1, 0.25, 0.5, 0.75, 0.9])

zundertj avatar Apr 07 '23 17:04 zundertj

+1 to adding this feature. Would be more consistent with pandas describe functionality.

wangkev avatar Apr 07 '23 20:04 wangkev