miller icon indicating copy to clipboard operation
miller copied to clipboard

Allow more “step -a” options to specify the number of records back

Open AndyXuma opened this issue 3 years ago • 1 comments

This is a feature request. Please allow more step -a options to specify the number of records back to be referenced. Currently, the slwin sliding window averages option requires a _m_n suffix to specify how many “m” records back and “n” records forward to reference. It would be incredibly helpful if at least shift_lag and a few other step options accept this as well: mlr step -a shift_lag_12 -f Sales would reference 12 records back in order to create, e.g. a field called Sales_12, referencing sales 12 months back. Other options, such as shift_lead, delta and ratio, would clearly benefit from this possibility as well.

My use case is for database-query post-processing, analysis and preparation for machine learning. I frequently use sequenced or time-series data and need to reference and analyze current values vs the same attributes lagged specific periods of time. I’m in the process of migrating and automating all my post-processing using Miller, which has been fantastic (thanks!).

I currently achieve the multiple-lag reference by then-chaining the shift option, as in mlr step -a shift -f Sales then step -a shift -f Sales_shift then…., also renaming the fields and deleting unnecessary ones. This seems to work but is rather lengthy and unfriendly.

Thanks so much.

AndyXuma avatar Mar 26 '22 12:03 AndyXuma

@AndyXuma awesome!! I knew when working on slwin that I was creating (within the code) some more general opportunities -- and I hoped there would be demand for them. I'm happy to hear that there is!! :)

johnkerl avatar Mar 26 '22 16:03 johnkerl