explorer
explorer copied to clipboard
Missing Window Functions from Polars
Polars has the following "rolling" (explorer calls it window) functions:
- polars.Series.rolling_apply
- polars.Series.rolling_max
- polars.Series.rolling_mean
- polars.Series.rolling_median
- polars.Series.rolling_min
- polars.Series.rolling_quantile
- polars.Series.rolling_skew
- polars.Series.rolling_std
- polars.Series.rolling_sum
- polars.Series.rolling_var
Although some of the rolling computations are already available in Explorer, the absence of a rolling_apply
equivalent makes it less convenient to calculate certain statistics and models.
As a result, users are forced to resort to workarounds that are far from ideal. For example, one could calculate the 21-day rolling standard deviation by using additional columns and a combination of existing functions. However, for users familiar with Pandas, this approach can feel unusual.
Is there any plan to support rolling_apply
in Explorer, or am I overlooking something?
I don't think we can support rolling_apply
because it is not possible to call Erlang from C/Rust without using message passing. As far as I see, the python version linked is fully implemented in C.
Can you provide a more concrete example that you are trying to address and how you are addressing it? Perhaps we can provide higher level conveniences without having it named rolling_apply
itself?
Sure!
I have a dataframe with daily returns from stocks and i need the 21-day rolling window of volatility (std dev) and correlation among these series.
My initial solution was similar to this:
require Explorer.Series, as: S
df = Explorer.Datasets.iris()
window_size = 3
S.to_enum(df[:sepal_length])
|> Enum.reduce({[], []}, fn e, {head, acc} ->
head = head ++ [e]
acc =
if Enum.count(head) < window_size do
acc ++ [nil]
else
acc ++
[
head
|> Enum.reverse()
|> Enum.take(window_size)
|> S.from_list()
|> S.standard_deviation()
]
end
{head, acc}
end)
|> elem(1)
I'm presenting this here so that the journey of how to implement this is documented as well, hope it helps. This would work for smaller dataframes, but performance would take a huge hit on larger ones.
So we got to a solution that looks like this:
df = Explorer.Datasets.iris()
window_size = 3
max_offset = S.size(df[:sepal_length]) - window_size
0..max_offset
|> Stream.map(&S.slice(df[:sepal_length], &1, window_size))
|> Stream.map(&S.standard_deviation/1)
|> Stream.chunk_every(1)
|> Stream.map(&S.from_list/1)
|> Enum.reduce(S.from_list([]), &S.concat(&2, &1))
If you have any pointers on this approach it would be of great help.
Some things I need to calculate over rolling windows:
- Standard deviation, quantile, variance, skew, cumulative sum and cumulative sum product (available in polars)
- Correlation between different series
- GARCH
- Cointegration tests
Thanks!
Maybe we could have a Series.window_map(series, callback)
function? The callback receives sliced series and it must numbers something that we can convert to a series again later?
Btw, I think your implementation could be:
0..max_offset
|> Stream.map(&S.slice(df[:sepal_length], &1, window_size))
|> Stream.map(&S.standard_deviation/1)
|> Enum.to_list()
|> S.from_list()
but i am not sure.
Would you like to send a PR for Series.window_map
btw?
Created this PR to explore a bit the codebase and test the waters. Waiting for review on it to make sure everything is ok. After that I plan on adding a bunch of functions that I need as well. Hope it helps.
Hi, I also have a use case for this. Here's equivalent code in Python:
df['atl'] = df['tss'].rolling(window=7).apply(lambda x: calculate_atl_recursive(x))
I can't solve this with the current package API, unless I'm missing something.
Thank you!
EDIT: I just realized who I'm in a thread with (famous people). Extra thank you for all your work.