torcharrow
torcharrow copied to clipboard
Native kernel binding for cast in CPU backend
Native kernel binding for cast in CPU backend
Background: TorchArrow Native Kernel Dispatch
For efficiency, a lot of TorchArrow operations (e.g. INumerialColumn.abs()) is dispatched to the Velox C++ native kernel. A Velox native kernel is represented by a compiled Velox ExprSet, which can be evaluated over the RowVector that represents input data.
As a concrete example, here is the lifecycle how NumerialColumnCpu.abs() get dispatched in CPU backend:
- The abs operation is delegated to
NumerialColumnCpu._data.abs(), here: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/torcharrow/velox_rt/numerical_column_cpu.py#L598-L601
NumerialColumnCpu._data is a SimpleColumnXXX (e.g. SimpleColumnBIGINT: see this PYI file) binded here: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/lib.cpp#L102-L103
-
This calls into
SimpleColumnBIGINT.abs(and also for other numeric types), which binds to C++ method here: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/lib.cpp#L355 -
The actual implementation of
SimpleColumn<T>::absin C++: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/column.h#L459-L465
which mainly consists of two steps:
-
Construct, compile and cache
ExprSet: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/column.cpp#L208-L228 Essentially it generates aCallTypedExpr, which calls the function abs over fieldc0.c0is again represented asFieldAccessTypedExpr -
Wrap the input vector into a row vector, and evaluate with Velox: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/column.cpp#L230-L244 See also Velox doc about expression eval in Velox.
Native Kernel Binding for cast
TorchArrow today contains a prototype implementation for cast, which essentially convert the data into Python object and performing the cast: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/torcharrow/icolumn.py#L238-L253
We want to leverage native Velox kernel to do the cast. Essentially we want to have native cast method SimpleColumn<T>::cast_to_XXX(), similar to SimpleColumn<T>::abs(), binding it to Python and let NumericalColumnCpu.cast delegates to NumericalColumnCpu._data.castXXX.
Cast is not modelled as a function call in Velox. So instead of creating CallTypedExpr, we need to create CastTypedExpr this time. Here is an example that constructs CastTypedExpr in TorchArrow codebase.
Thoughts To Start With
- We can start with native cast method with hard-coded type (e.g.
SimpleColumn<T>::cast_to_int8), and have an end-to-end development experience, and later the refactor to cast method takes Type as input parameter. - We can start with only numerical casts by overriding
NumericColumn.castin Python
References:
- Velox doc about Expression Evaluation: https://facebookincubator.github.io/velox/develop/expression-evaluation.html#expression-trees
cc @scotts