temporian icon indicating copy to clipboard operation
temporian copied to clipboard

Moving quantile

Open javiber opened this issue 9 months ago • 0 comments

This PR implements a slow but exact (as in not an approximation) of the moving quantile.

Implementation considerations:

The implementation splits the window into smaller and bigger values split into 2 heaps while balancing them to keep the desired value near the top, for instance, if looking for the median the heaps will be balanced, if looking for the 0.25 percentile the smaller heap will containing 20% of values.

It uses custom heaps because we need to remove items from the heap (that may or may not be at the top) as they are removed from the windows, for this a mapping is maintained in memory.

In cases where the desired quantile is between 2 values, the average of the 2 closer values is used.

The result is equivalent to Numpy's "averaged_inverted_cdf" as tested in one of the unit tests.

I'm going to measure memory usage shortly but this could be an issue with long windows.

Timing considerations:

  • moving_quantile is significantly slower than other moving operations (~20X slower than simple_moving_average)
  • it's comparable to the pandas equivalent df.rolling(window=w, center=False).quantile(q). It's slower with longer windows and extreme quantiles (these make on of the heaps bigger which makes inserts more costly). See plots below

Note that our solution also works on non-uniformly sampled data where pandas wouldn't.

image image

Other changes:

  • had to make several changes to the window operations to accommodate the new parameter quantile
  • Fixed an issue with negative window lengths

javiber avatar May 06 '24 22:05 javiber