rbc
rbc copied to clipboard
Variadic input columns support, for discussion
Original description:
Potential variadic input (and output?) capability by defining dataframe type that can have any number of columns.
- Perhaps restrict to columns of the same type
- Consider array/matrix format or framework and making these first class types in the db, allowing stride/format to be set, with fast methods to move between columnar input and output for interop with sql engine
- TM 11/27/2020… most of the ml libs (like mlpack) require matrices of a single type, i.e. float or double, we could use this to define a matrix format just like we do for Columns, which would accept any number of columns of a given type
- In DAAL & oneDAL we support both cases. If columns have different types, we anyway convert them to some type internally in block/tile fashion to preserve cache locality (to convert block/tile that fits into cache, and process it while it is already in the cache) - it allows to achieve better performance relative to entire matrix conversion before passing into ML alg.
- Would then want to support numpy/torch/etc on these
https://github.com/omnisci/omniscidb-internal/pull/5274 implements ColumnList for C++ UDTFs
mlpack support is implemented in https://github.com/omnisci/omniscidb-internal/pull/5430