rbc icon indicating copy to clipboard operation
rbc copied to clipboard

Variadic input columns support, for discussion

Open pearu opened this issue 5 years ago • 2 comments

Original description:

Potential variadic input (and output?) capability by defining dataframe type that can have any number of columns.

  • Perhaps restrict to columns of the same type
  • Consider array/matrix format or framework and making these first class types in the db, allowing stride/format to be set, with fast methods to move between columnar input and output for interop with sql engine
  • TM 11/27/2020… most of the ml libs (like mlpack) require matrices of a single type, i.e. float or double, we could use this to define a matrix format just like we do for Columns, which would accept any number of columns of a given type
    • In DAAL & oneDAL we support both cases. If columns have different types, we anyway convert them to some type internally in block/tile fashion to preserve cache locality (to convert block/tile that fits into cache, and process it while it is already in the cache) - it allows to achieve better performance relative to entire matrix conversion before passing into ML alg.
  • Would then want to support numpy/torch/etc on these

pearu avatar Jan 13 '21 12:01 pearu

https://github.com/omnisci/omniscidb-internal/pull/5274 implements ColumnList for C++ UDTFs

pearu avatar Mar 01 '21 19:03 pearu

mlpack support is implemented in https://github.com/omnisci/omniscidb-internal/pull/5430

pearu avatar Mar 31 '21 12:03 pearu