darts
darts copied to clipboard
Add window features for RegressionModels
In addition to "lag features" which we already have, it'd be nice to add "window features", specifying window characteristics and corresponding function(s) to apply to create features dynamically in regression models. For instance, it is often helpful to use the trailing mean and variance of the last N points as features. We could also imagine having a way to have fairly generic windows (e.g., "last month", "last week", "the N points starting N-k time steps ago", etc...
@hrzn Hi! Will have a go at this one instead 😄
That'd be awesome!! You can take a look at this talk for a nice overview. In Darts the RegressionModel
class would be the place to start. Let us know if there are some design decisions you'd like to discuss. The most important thing will be to get the API right and keep it as simple as possible.
@adamkells are you working on this one?
@hrzn Planning to work on this on Friday. Watched the talk and want to get some thoughts on scope and design.
I think we probably want to add the functionality in one of two ways:
Option 1: Something analogous to the way lags are currently handled. So adding as function inputs:
- windows: (list of integers specifying window sizes for target column)
- windows_past_covariates: (list of integers specifying window sizes for covariates)
- window_functions: (list containing strings specifying possible windowing functions mean, z_score, ewma etc.)
Option 2: Just having a single nested input dictionary:
{'target': {'function': 'ewma', 'window_size': 5},
'covariate_1': {'function': 'mean', window_size': 10}}
What do you think?
awesome :)
I think I'm more in favour of Option 2, mostly in order to keep the API not too cluttered, and keep some flexibility in the structure of this dictionary without requiring further adaptations to the call signature. We also already have such a dict for the add_encoders
parameter (see docs).
I think your example looks quite good - some notes:
- I think we should probably also support windows on future covariates (not only target and past covariates)
- For this reason, although it could be quite powerful to be able to specify per-covariate-dimension windows, I think it'd already be very nice to have the same window applied to all components of {past, future}_covariates. You could also try to do it fancy directly, but I expect a bit of complexity, for instance to handle the case where the target and the past (or future) covariate series share components with the same names. One way could be to make it look like this:
{
'target': {'function': 'ewma', 'window_size': 5},
'future': {'all': [{'function': 'mean', 'window_size': 10}]},
'past': {'component1': [{'function': 'ewma', 'window_size': 5}]}
}
but it is slightly more complex...
- We probably need to accept a list of functions (to add potentially several windows)
- It'd be nice to accept an actual Python function (e.g., a lambda) in addition to a name - something applying on a DataFrame. This way users can specify their own windowing functions :)
@dennisbader @piaz97, wdyt?
Update on this:
-
Dictionary format:
- I think the best format to avoid overly nested dictionaries is to allow keywords of ['target', 'future' and 'past']. I'd prefer to keep specific variables transformations as a separate PR to reduce the scope of this piece.
- The value for each key can then be either a dictionary defining the function to be applied or a list of dictionaries for multiple functions.
- Each value dictionary can have the function to be used and all the parameters to be passed to the function.
{
'target': {'function': user_function, 'param_1': 1,' param_2': 2},
'future': {'function': 'mean', 'window_size': 10}},
'past': [{'function': 'ewma', 'window_size': 5},
{'function': user_function, 'param_1': 1, 'param_2': 2}]}
}
-
Available functions:
- There is a little bit of awkwardness around functions which requiring aggregation. I.e. for an ewma transformation, we need to apply
ewm(window_size).mean()
which cannot easily be passed to the dictionary without wrapping inside a custom function. - One option would be to allow an
aggregation
parameter in the dictionary which can take values such asrolling
orewm
. - My preference is to have a list of common use cases coded so users can pass
'ewma'
or'rolling_mean'
as strings. Then when a user has a use case that falls outside these predefined cases, to advise them to write their own function to pass to the dictionary.
- There is a little bit of awkwardness around functions which requiring aggregation. I.e. for an ewma transformation, we need to apply
Update on this:
Dictionary format:
- I think the best format to avoid overly nested dictionaries is to allow keywords of ['target', 'future' and 'past']. I'd prefer to keep specific variables transformations as a separate PR to reduce the scope of this piece.
- The value for each key can then be either a dictionary defining the function to be applied or a list of dictionaries for multiple functions.
- Each value dictionary can have the function to be used and all the parameters to be passed to the function.
{ 'target': {'function': user_function, 'param_1': 1,' param_2': 2}, 'future': {'function': 'mean', 'window_size': 10}}, 'past': [{'function': 'ewma', 'window_size': 5}, {'function': user_function, 'param_1': 1, 'param_2': 2}]} }
Sounds great. You could even do it simpler and not support specifying the parameters (param_1
etc) in the function specification. We can assume that the provided function always works on a a window dataframe and does not need extra parameters (which for users I think would be easy to manage, using partial functions for instance). That would also avoid awkward cases where a user-provided function has an argument named window_size
).
- My preference is to have a list of common use cases coded so users can pass
'ewma'
or'rolling_mean'
as strings. Then when a user has a use case that falls outside these predefined cases, to advise them to write their own function to pass to the dictionary.
Agree, sounds good 👍
Hi @adamkells, are you getting a chance to work on this issue? I'm asking because it is quite key for our roadmap, there is no rush, but if you are unsure, we can maybe take it up. Let us know :)
Hi @hrzn Sorry about the delay, I've set time aside to do open-source work every second Friday so will get a run at it tomorrow. I'm happy to hand it over after tomorrow if you want to finish it off, at any rate I could probably use a bit of help with the testing etc.
No worries @adamkells, we really appreciate your efforts. It would be great if after next Friday you could maybe open a draft PR of what you have so far, and we can work collaboratively on this from there. Thanks!
@hrzn Have opened a draft PR with the work so far. Apologies it's a bit of a mess, let me know how I can help improve it.
Thanks @adamkells, I'll try to look at it sometime soon