woodwork icon indicating copy to clipboard operation
woodwork copied to clipboard

Optimize `_get_describe_dict` to minimize repetitive computation

Open tamargrey opened this issue 3 years ago • 0 comments

Many of the stats calculated by _get_describe_dict rely on having sorted data, (min, max, quartiles, top values, recent values), and the current implementation is likely repeating some of the sorting that happens in order to get these stats.

For example, we do calculate all of the aggregation stats at once, but we have a separate call to series.quantile([0.25, 0.5, 0.75]).

We should look into performing these calculations in a way that only sorts the data once, as most of the other computations after sorting will not be reliant on the length of the data.

tamargrey avatar Jul 07 '21 21:07 tamargrey