rbc icon indicating copy to clipboard operation
rbc copied to clipboard

UDTF: Collect multiple input columns to a single Matrix/Columns argument

Open pearu opened this issue 4 years ago • 0 comments

Example:

@omnisci(int32(Matrix<T>, int32, RowMultiplier, OutputColumn<T>), T=['double', 'float', 'int64', 'int32', ...])
def get_column_by_index(matrix, column_index, m, output):
    if 0<= column_index<matrix.shape[1]:
        for i in range(matrix.shape[0]):
            output[i] = matrix[i, column_index]
        return matrix.shape[0]
    return 0

SQL query:

select out0 from table(get_column_by_index(cursor(select cast(a as double), cast(b as double) from mytable), 0, 1))

would return the content of mytable.a as double column in out0.

Notes:

  • the column data is stored in matrix row-wise to facilitate efficient use of library functions that expect row-wise (C) storage order
  • for column-wise storage order, use Columns<T>, which will be useful for library functions expecting column-wise (Fortran) storage order

The same UDTF could be rewritten as

@omnisci(int32(Matrix<T>, int32, RowMultiplier, OutputColumn<T>), T=['double', 'float', 'int64', 'int32', ...])
def get_column_by_index(matrix, column_index, m, output):
    if 0<= column_index<matrix.shape[1]:
        output[:] = matrix[:, column_index]
        return matrix.shape[0]
    return 0

pearu avatar Jan 21 '21 15:01 pearu