rbc icon indicating copy to clipboard operation
rbc copied to clipboard

Support accessing neighboring rows within a UDF

Open pearu opened this issue 4 years ago • 1 comments

https://quansight.slack.com/archives/C9U5ZTWHX/p1605197922270000:

Todd has come across a strong use case for being able to zip the output of a udtf to the table they are being run on, assuming a 1:1 i/o mapping (every row in gets a row/column out in the same slot). Effectively it’s a lot semantically like a udf, except that we need to access neighboring rows.

Aim:

select x, y, foo(x) from table_x_y

where foo is supposed to compute the local average of x.

This could be currently modelled using a UDTF:

@omnisci('Cursor<double, double>, RowMultiplier, OutputColumn<double>, OutputColumn<double>, OutputColumn<double>')
def ext_foo(x, y, m, x_out, y_out, avg_x):
    for i in range(len(x)):
        x_out[i] = x[i]
        y_out[i] = y[i]
        if i>0 and i<len(x)-1:
            avg_x[i] = (x[i + 1] + x[i] + x[i - 1])/3.0
        else:
            avg_x[i] = x[i]
    return len(x)

and the SQL statement becomes:

select out0, out1, out2 from table(ext_foo(cursor(select x, y from table_x_y), 1))

pearu avatar Nov 17 '20 09:11 pearu