polars
polars copied to clipboard
Able to add list of percentiles to df.describe()
It would be nice to add a list of percentile to report in df.describe()
for example, df.describe([0.25, 0.75]) reports
describe str | "mean" "std" "min" "25%" "median" "75%" "max"
If this is the df.describe default would be great.
Hello, thank you for this request.
Please note that describe
is just a short-hand iterating over the aggregation functions: https://github.com/pola-rs/polars/blob/093d55c6b4bed6b76f3d814e7e66030dac3c4f87/py-polars/polars/internals/frame.py#L2484
You could thus simply create your own (note: not tested!):
def describe(self, qs: list[float} | None = None):
def describe_cast(self: DF) -> DF:
...
stats = [self.mean(), self.std() ...]
labels = ["mean", "std", ...]
if qs:
stats += [self.quantile(q) for q in qs]
labels += [f"quantile{q}" for q in qs]
summary = self._from_pydf(
pli.concat([describe_cast(st) for st in stats]
)._df
)
summary.insert_at_idx(
0, pli.Series("describe", labels)
)
return summary
Thanks! This works great as temporary fix.
@zundertj @leoliu0
Could you please fill in the blanks in the above function on how to add percentiles?
Are there any plans to add the percentiles
argument to the .describe()
function? This would be super helpful!
See the qs
input argument, that is the percentiles as fractions:
df = pl.DataFrame(...)
describe(df, qs=[0.1, 0.25, 0.5, 0.75, 0.9])
+1 to adding this feature. Would be more consistent with pandas describe functionality.