cudf [FEA] need official EWM function

Is your feature request related to a problem? Please describe. EWM is a very popular method used in time series analysis, especially in the domain of FSI. cuIndicator is using EWM a lot to compute the technical indicators. It is good to have official support in the cuDF.

Describe the solution you'd like DataFrame.ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0). The same interface as the Pandas EWM function

Describe alternatives you've considered cuIndicator has the implementation that is based on rolling window methods. cuIndicator EWM.

Additional context EWM can be implemented by prefix-sum method if we weight the past carefully. I have the example implementation for it.

[x] mean
[ ] standard deviation
[ ] variance
[ ] covariance
[ ] correlation

Mar 22 '19 19:03 yidong72

@yidong72 @randerzander @beckernick What are the specific aggregations needed to implement this on top of the new rolling window functionality?

Jul 04 '19 05:07 harrism

Hello, any plan in merging this or other implementation/s of EWM function? Or any temp fix that I could use for now? Appreciated.

Jun 29 '20 01:06 rouniuyizu

@kkraus14 @yidong72 @beckernick need help understanding what is needed from libcudf.

Jul 02 '20 04:07 harrism

We haven't scoped this function as of yet from the cuDF Python side so we can't guide libcudf as of yet. I don't think this is currently a high priority for us.

Jul 02 '20 04:07 kkraus14

@harrism. The implementation I have is just adding a weight term to the time series items in the rolling window fun. So it should be straight forward implementation on top of rolling window fun

Jul 02 '20 20:07 yidong72

We have seen some folks at FSI who are interested in the official EWM function. Check this issue we got from gQuant project. I can fix it in a hacky way but it is nice to have official support from cudf.

https://github.com/rapidsai/gQuant/issues/100

Please increase the priority of this issue.

Aug 12 '20 15:08 yidong72

In pandas, EWM provides various exponential weighted functions including mean, variance, standard deviation, and more. I'm going to update the issue to include a task-list of the various functions.

Exponential weighted mean is the canonical usage, which makes it a good starting point for the next release.

Jul 14 '21 15:07 beckernick

I've scoped this out and there's a couple of design caveats I would like to discuss before proceeding with an implementation.

TL;DR: I am not sure how to do this in a way that is actually performant.

This function in pandas behaves more like a single large window over the entire data than a rolling window function like what is normally envisioned. That is to say that by default, each element of the result is the weighted average of all of those that come before it in the sequence.

The formula for a single result is quite clear from the pandas documentation:

There's really two straightforward ways of computing this sequence and neither of them seem to really help us very much.

Compute all the ys in parallel (each processing element is responsible for a single y). This doesn't help us because the work is very uneven, with the worst off thread (the last one) still needing to compute coefficients for the entire sequence and then reduce the result into a weighted average.
Generate the results sequentially, keeping track of the numerator and denominator of the equation, and recognizing that each time we advance we need only multiply the numerator by a factor of (1 - α), and multiply the denominator by the same (1 - α) and add 1 to get the next result. This is indeed how pandas makes the calculation tractable.

In the case where we really are using this within a window function, this problem goes away, as long as the window size is small relative to the data size (each thread applies the above sequential algorithm for its window). We could thus implement this on top of rolling technically, but we can't just wrap that functionality with window=len(data) otherwise the performance will be abysmal.

It seems like what is needed here is a truly parallel algorithm that properly balances the work each computing element is doing across the moving average calculation.

Jul 28 '21 16:07 brandon-b-miller

This can be computed efficiently in parallel using two scans (one for the numerator, one for the denominator) and a binop (divide).

Jul 29 '21 00:07 harrism

Unless I am misunderstanding that works for getting one of the datapoints we need (any single one) but not the entire sequence. Each element of the result is the result of dividing two things, but those things are the sums of sequences and those sequences are different for each element in question. Consider the first few denominators d_i and let beta = (1 - alpha):

d_0 = 1 d_1 = 1 + beta d_2 = 1 + beta + beta ** 2

In general d_n = sum(k = 0:n [beta ** n]) which is a finite series for which I do not think there is a closed form solution in terms of simple arithmetic operations. That said, one can see that

d_1 = beta * (1) + 1 = (beta * d_0) + 1 d_2 = beta * (1 + beta) + 1 = (beta * d_1) + 1

Meaning each successive term is related to the last by

d_this = (beta * d_previous) + 1

Which makes for an efficient serial algorithm for computing these terms without having to actually sum over an entirely new set of numbers. Unfortunately this doesn't seem to help us towards a thrust implementation because if we were trying to do an inclusive scan, we'd have this as our binary_op:

def f(d_previous, d_this):
    return (beta * d_previous) + 1
beta = 0.1
f(1, f(2,3))
# 1.11
f(f(1,2), 3)
# 1.1

I believe this breaks the associativity needed for an inclusive scan.

Jul 29 '21 20:07 brandon-b-miller

Is this the naive implementation, or is this totally wrong?

std::vector<double> input(...);
std::vector<double> output(...);
double alpha = 0.5;
auto running_sum = 0;

for (auto i = 0; i < input.size(); i++) {
  running_sum = input[i] + running_sum * (1 - alpha);
  output[i] = running_sum;
}

Jul 29 '21 22:07 cwharris

Solving recurrence equations is in Guy Bleloch's classic paper "Prefix Sums and their applications". http://www.cs.cmu.edu/~blelloch/papers/Ble93.pdf (Section 1.4)

The trick is to maintain the intermediates as pairs, rather than as individual values. Let b = 1-alpha. For the numerator, the input is a list of pairs [(b, x_0), (b, x_1), (b, x_2), ...]. The numerator operator (in Python), is:

def f(pair_i, pair_j):
    return (pair_i[0]*pair_j[0], pair_i[1]*pair_j[0] + pair_j[1])

Test input demonstrates associativity:

>>> beta=0.1
>>> a=(beta, 1)
>>> b=(beta, 2)
>>> c=(beta, 3)
>>> f(a, f(b, c))
(0.0010000000000000002, 3.21)
>>> f(f(a, b), c)
(0.0010000000000000002, 3.21)

To get the numerator out of the scan, after performing the inclusive scan, just extract all the second elements of the pairs.

Intuitively, we are propagating the product of the 1-alphas separately from the summation. This is a really simple recurrence, for which Blelloch gives a comprehensive proof and requirements on the operator. He also proves that higher order recurrences can be reduced and implemented with scans the same way, with the appropriate operator.

This paper is required reading, IMO. You will see scans everywhere once you start seeing them. :)

(Note, implementation with Thrust is pretty simple -- just use a zip iterator with a constant iterator (1-alpha) and the input iterator and use a lambda that returns the modified pair as in the Python f.)

Jul 29 '21 23:07 harrism

I stumbled on this paper this morning while Googling "Prefix sums recursion relations" after a few of us met to discuss this problem yesterday. It's so elegant how separating the current power of the prefactor makes the recursion operator associative! Thanks for pointing us in the right direction.

Jul 30 '21 18:07 vyasr

thanks @harrism this works perfectly using thrust in my experiments. It's a little hard to for me to tell if this really belongs as a rolling aggregation, should that still be the plan or is there a more appropriate place for this to live inside of libcudf?

Aug 02 '21 14:08 brandon-b-miller

My pleasure. I don't know the answer to your question. Is it different from a rolling aggregation in some way? Does it have finite window extents, or does every element depend on all preceding elements over the entire series? CC @jrhemstad

Aug 03 '21 04:08 harrism

The particular pandas API is the version where every element depends on all the previous ones.

pandas does support a windowed version of this via different API. But I am not sure our version, were we ever to support it, would need to actually parallelize within the windows - at least for small window sizes relative to the data the normal recurrence relation might perform fine on its own within the windows.

Aug 03 '21 11:08 brandon-b-miller

For the one where every element depends on all previous ones, it may be best to add this as an operator to our existing scan API.

The windowed version sounds like rolling. Or could be done as an operator to the segmented scan API.

Aug 04 '21 23:08 harrism

Dose it i supported in 23.02?

Mar 02 '23 16:03 Haidow

Hi @Haidow , We've been focusing on some other priorities for a while since we last looked at this, but there's been some more interest recently, so I'll start a conversation about finishing this up in the next few releases.

Mar 06 '23 14:03 brandon-b-miller

@brandon-b-miller Yes plz!

Jul 15 '23 08:07 felixmccuaig

Thanks @brandon-b-miller, any update on this?

Nov 23 '23 01:11 cpowr

#9027 adds the mean aggregation. We'll follow up in additional PRs with the other functions.

Jun 24 '24 19:06 vyasr

Congrats on finally merging this @brandon-b-miller! Scan FTW!

Jun 24 '24 22:06 harrism