rbc
rbc copied to clipboard
Support accessing neighboring rows within a UDF
https://quansight.slack.com/archives/C9U5ZTWHX/p1605197922270000:
Todd has come across a strong use case for being able to zip the output of a udtf to the table they are being run on, assuming a 1:1 i/o mapping (every row in gets a row/column out in the same slot). Effectively it’s a lot semantically like a udf, except that we need to access neighboring rows.
Aim:
select x, y, foo(x) from table_x_y
where foo
is supposed to compute the local average of x.
This could be currently modelled using a UDTF:
@omnisci('Cursor<double, double>, RowMultiplier, OutputColumn<double>, OutputColumn<double>, OutputColumn<double>')
def ext_foo(x, y, m, x_out, y_out, avg_x):
for i in range(len(x)):
x_out[i] = x[i]
y_out[i] = y[i]
if i>0 and i<len(x)-1:
avg_x[i] = (x[i + 1] + x[i] + x[i - 1])/3.0
else:
avg_x[i] = x[i]
return len(x)
and the SQL statement becomes:
select out0, out1, out2 from table(ext_foo(cursor(select x, y from table_x_y), 1))