Native kernel binding for cast in CPU backend

Open wenleix opened this issue 3 years ago • 1 comments

Native kernel binding for cast in CPU backend

Background: TorchArrow Native Kernel Dispatch

For efficiency, a lot of TorchArrow operations (e.g. INumerialColumn.abs()) is dispatched to the Velox C++ native kernel. A Velox native kernel is represented by a compiled Velox ExprSet, which can be evaluated over the RowVector that represents input data.

As a concrete example, here is the lifecycle how NumerialColumnCpu.abs() get dispatched in CPU backend:

The abs operation is delegated to NumerialColumnCpu._data.abs(), here: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/torcharrow/velox_rt/numerical_column_cpu.py#L598-L601

NumerialColumnCpu._data is a SimpleColumnXXX (e.g. SimpleColumnBIGINT: see this PYI file) binded here: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/lib.cpp#L102-L103

This calls into SimpleColumnBIGINT.abs (and also for other numeric types), which binds to C++ method here: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/lib.cpp#L355
The actual implementation of SimpleColumn<T>::abs in C++: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/column.h#L459-L465

which mainly consists of two steps:

Construct, compile and cache ExprSet: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/column.cpp#L208-L228 Essentially it generates a CallTypedExpr, which calls the function abs over field c0. c0 is again represented as FieldAccessTypedExpr
Wrap the input vector into a row vector, and evaluate with Velox: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/column.cpp#L230-L244 See also Velox doc about expression eval in Velox.

Native Kernel Binding for cast

TorchArrow today contains a prototype implementation for cast, which essentially convert the data into Python object and performing the cast: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/torcharrow/icolumn.py#L238-L253

We want to leverage native Velox kernel to do the cast. Essentially we want to have native cast method SimpleColumn<T>::cast_to_XXX(), similar to SimpleColumn<T>::abs(), binding it to Python and let NumericalColumnCpu.cast delegates to NumericalColumnCpu._data.castXXX.

Cast is not modelled as a function call in Velox. So instead of creating CallTypedExpr, we need to create CastTypedExpr this time. Here is an example that constructs CastTypedExpr in TorchArrow codebase.

Thoughts To Start With

We can start with native cast method with hard-coded type (e.g. SimpleColumn<T>::cast_to_int8), and have an end-to-end development experience, and later the refactor to cast method takes Type as input parameter.
We can start with only numerical casts by overriding NumericColumn.cast in Python

References:

Velox doc about Expression Evaluation: https://facebookincubator.github.io/velox/develop/expression-evaluation.html#expression-trees

Jan 18 '22 19:01 wenleix

cc @scotts

Jan 18 '22 19:01 wenleix

torcharrow torcharrow copied to clipboard

Native kernel binding for cast in CPU backend

Native kernel binding for cast in CPU backend

Background: TorchArrow Native Kernel Dispatch

Native Kernel Binding for cast

Thoughts To Start With

References:

torcharrow
torcharrow copied to clipboard