Consider changing the UDTF syntax to match the one used in OmniSciDB
@pearu, I'd to bring this to discussion. Currently, the RBC syntax used to declare a UDTF function is a bit different from the one used in OmniSciDB. This might bring some confusion to users if they try to replicate the same syntax from a product into another. For instance, RBC contains a set of syntax-sugars to declare output columns, namely OutputColumn<T>, whereas output columns in OmniSciDB are declared after the -> symbol:
@omnisci('int32(Column<double>, Constant<1>, OutputColumn<double>)
def pyfunc(x, y):
y[0] = x[0]
return 1
/*
UDTF: int32_t cpp_func(Column<double>, Constant<1>) -> Column<double>
*/
int32_t cpp_func(const Column<double> &input, Column<double> &out) {
out[0] = input[0];
return 1;
}
Another example is in the UDTF Normalize notebook, where one of the examples use int32|sizer=RowMultiplier to name a sizer. This notation should be deprecated (or not used in notebooks), and a user should only use RowMultiplier to refer to it. Similar to what is done in OmniSciDB.
First, yes, I agree that declaring UDTF from rbc and from C++ UDTF annotations ought to use the same syntactic sugar.
Here's a suggestion: in rbc.typesystem, when -> is encountered, it will translate what follows to output columns. For instance, we should have
Type.fromstring('int32(Column<double>, OutputColumn<double>)') == Type.fromstring('int32(Column<double>) -> Column<double>')
Second, I agree, sizer should not be used in examples, tests, nor notebooks. Although, internally sizer is an annotation (IIRC) and int32|sizer=RowMultiplier is the internal representation of RowMultiplier and hence cannot be deprecated per se.