rbc
rbc copied to clipboard
UDTF: Collect multiple input columns to a single Matrix/Columns argument
Example:
@omnisci(int32(Matrix<T>, int32, RowMultiplier, OutputColumn<T>), T=['double', 'float', 'int64', 'int32', ...])
def get_column_by_index(matrix, column_index, m, output):
if 0<= column_index<matrix.shape[1]:
for i in range(matrix.shape[0]):
output[i] = matrix[i, column_index]
return matrix.shape[0]
return 0
SQL query:
select out0 from table(get_column_by_index(cursor(select cast(a as double), cast(b as double) from mytable), 0, 1))
would return the content of mytable.a
as double column in out0
.
Notes:
- the column data is stored in
matrix
row-wise to facilitate efficient use of library functions that expect row-wise (C) storage order - for column-wise storage order, use
Columns<T>
, which will be useful for library functions expecting column-wise (Fortran) storage order
The same UDTF could be rewritten as
@omnisci(int32(Matrix<T>, int32, RowMultiplier, OutputColumn<T>), T=['double', 'float', 'int64', 'int32', ...])
def get_column_by_index(matrix, column_index, m, output):
if 0<= column_index<matrix.shape[1]:
output[:] = matrix[:, column_index]
return matrix.shape[0]
return 0