datatable icon indicating copy to clipboard operation
datatable copied to clipboard

Rolling aggregate support based on windows within a DT

Open atroiano opened this issue 6 years ago • 5 comments

I'd like to see the ability to get different rolling aggregations of my dataset based on order and grouping columns. Pandas has robust support for these type of actions. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html. It would also be nice to easily assign these to new columns without having to make nested for loops.
Below is a function I use in Pandas to achieve this functionality

def create_roll_columns (x,g_c,roll,roll_cols,roll_types):
    #roll types [sum,mean,min]
    #g_c needs to be a list
    #lag will set the number of lag cols
    for i in roll_cols:
        for j in range(2,roll+1):
            for aggregation in roll_types:
                if aggregation == 'sum':
                    nm = '{0}_sum_{1}'.format(i, str(j))
                    x[nm] = x.groupby(g_c)[i].rolling(j, min_periods=1).sum().reset_index(0,drop=True)
                if aggregation == 'mean':
                    nm = '{0}_mean_{1}'.format(i, str(j))
                    x[nm] = x.groupby(g_c)[i].rolling(j, min_periods=1).mean().reset_index(0,drop=True)
                if aggregation == 'min':
                    nm = '{0}_min_{1}'.format(i, str(j))
                    x[nm] = x.groupby(g_c)[i].rolling(j, min_periods=1).min().reset_index(0,drop=True)              
    return(x)  

Example code could be something like - df(select=mean(f.x), group="y",window = 3, align='r')
-group = windowing columns for the select statement -window = many rows is looks back or forward in a given group

  • Align is right, left, center and coordinates where the window happens relative to the given record.

atroiano avatar Dec 20 '18 17:12 atroiano

I'm going to remove lead/lag from here, since it is a separate feature, and deserves its own issue.

The rolling functions are an important feature to add; there is a similar work being done in R data.table as well: https://github.com/Rdatatable/data.table/issues/2778

st-pasha avatar Dec 21 '18 00:12 st-pasha

The R data.table implementation (as of now rollmean) was designed to be R agnostic, so we should be able to reduce maintanence effort by reusing that code.

jangorecki avatar Dec 21 '18 02:12 jangorecki

@st-pasha Do you want me to raise a ticket about lead/lag?

atroiano avatar Dec 21 '18 14:12 atroiano

@atroiano Yes, please do. I guess the function name will be shift(n), in concord with R data.table, and also because it nicely allows both positive and negative ns.

st-pasha avatar Dec 21 '18 18:12 st-pasha

In R we use zoo or own implementation of rolling function with next syntax:

DT[, ref_price := roll(price, fill = T, width = 16, fun = max), by = .(banner, region, sku)]

Is there way to implement with similar syntax?

Viktor-Demin avatar Mar 18 '19 18:03 Viktor-Demin