Allow more “step -a” options to specify the number of records back
This is a feature request. Please allow more step -a options to specify the number of records back to be referenced.
Currently, the slwin sliding window averages option requires a _m_n suffix to specify how many “m” records back and “n” records forward to reference.
It would be incredibly helpful if at least shift_lag and a few other step options accept this as well:
mlr step -a shift_lag_12 -f Sales would reference 12 records back in order to create, e.g. a field called Sales_12, referencing sales 12 months back.
Other options, such as shift_lead, delta and ratio, would clearly benefit from this possibility as well.
My use case is for database-query post-processing, analysis and preparation for machine learning. I frequently use sequenced or time-series data and need to reference and analyze current values vs the same attributes lagged specific periods of time. I’m in the process of migrating and automating all my post-processing using Miller, which has been fantastic (thanks!).
I currently achieve the multiple-lag reference by then-chaining the shift option, as in mlr step -a shift -f Sales then step -a shift -f Sales_shift then…., also renaming the fields and deleting unnecessary ones. This seems to work but is rather lengthy and unfriendly.
Thanks so much.
@AndyXuma awesome!! I knew when working on slwin that I was creating (within the code) some more general opportunities -- and I hoped there would be demand for them. I'm happy to hear that there is!! :)