temporian
temporian copied to clipboard
Moving quantile
This PR implements a slow but exact (as in not an approximation) of the moving quantile.
Implementation considerations:
The implementation splits the window into smaller and bigger values split into 2 heaps while balancing them to keep the desired value near the top, for instance, if looking for the median the heaps will be balanced, if looking for the 0.25 percentile the smaller heap will containing 20% of values.
It uses custom heaps because we need to remove items from the heap (that may or may not be at the top) as they are removed from the windows, for this a mapping is maintained in memory.
In cases where the desired quantile is between 2 values, the average of the 2 closer values is used.
The result is equivalent to Numpy's "averaged_inverted_cdf"
as tested in one of the unit tests.
I'm going to measure memory usage shortly but this could be an issue with long windows.
Timing considerations:
-
moving_quantile
is significantly slower than other moving operations (~20X slower thansimple_moving_average
) - it's comparable to the pandas equivalent
df.rolling(window=w, center=False).quantile(q)
. It's slower with longer windows and extreme quantiles (these make on of the heaps bigger which makes inserts more costly). See plots below
Note that our solution also works on non-uniformly sampled data where pandas wouldn't.
Other changes:
- had to make several changes to the window operations to accommodate the new parameter
quantile
- Fixed an issue with negative window lengths