Consider changing the UDTF syntax to match the one used in OmniSciDB

Open guilhermeleobas opened this issue 4 years ago • 1 comments

@pearu, I'd to bring this to discussion. Currently, the RBC syntax used to declare a UDTF function is a bit different from the one used in OmniSciDB. This might bring some confusion to users if they try to replicate the same syntax from a product into another. For instance, RBC contains a set of syntax-sugars to declare output columns, namely OutputColumn<T>, whereas output columns in OmniSciDB are declared after the -> symbol:

@omnisci('int32(Column<double>, Constant<1>, OutputColumn<double>)
def pyfunc(x, y):
    y[0] = x[0]
    return 1

/*
    UDTF: int32_t cpp_func(Column<double>, Constant<1>) -> Column<double>
*/
int32_t cpp_func(const Column<double> &input, Column<double> &out) {
    out[0] = input[0];
    return 1;
}

Another example is in the UDTF Normalize notebook, where one of the examples use int32|sizer=RowMultiplier to name a sizer. This notation should be deprecated (or not used in notebooks), and a user should only use RowMultiplier to refer to it. Similar to what is done in OmniSciDB.

Oct 19 '21 17:10 guilhermeleobas

First, yes, I agree that declaring UDTF from rbc and from C++ UDTF annotations ought to use the same syntactic sugar. Here's a suggestion: in rbc.typesystem, when -> is encountered, it will translate what follows to output columns. For instance, we should have

Type.fromstring('int32(Column<double>, OutputColumn<double>)') == Type.fromstring('int32(Column<double>) -> Column<double>')

Second, I agree, sizer should not be used in examples, tests, nor notebooks. Although, internally sizer is an annotation (IIRC) and int32|sizer=RowMultiplier is the internal representation of RowMultiplier and hence cannot be deprecated per se.

Oct 19 '21 18:10 pearu