torcharrow icon indicating copy to clipboard operation
torcharrow copied to clipboard

Native kernel binding for cast in CPU backend

Open wenleix opened this issue 3 years ago • 1 comments

Native kernel binding for cast in CPU backend

Background: TorchArrow Native Kernel Dispatch

For efficiency, a lot of TorchArrow operations (e.g. INumerialColumn.abs()) is dispatched to the Velox C++ native kernel. A Velox native kernel is represented by a compiled Velox ExprSet, which can be evaluated over the RowVector that represents input data.

As a concrete example, here is the lifecycle how NumerialColumnCpu.abs() get dispatched in CPU backend:

  1. The abs operation is delegated to NumerialColumnCpu._data.abs(), here: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/torcharrow/velox_rt/numerical_column_cpu.py#L598-L601

NumerialColumnCpu._data is a SimpleColumnXXX (e.g. SimpleColumnBIGINT: see this PYI file) binded here: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/lib.cpp#L102-L103

  1. This calls into SimpleColumnBIGINT.abs (and also for other numeric types), which binds to C++ method here: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/lib.cpp#L355

  2. The actual implementation of SimpleColumn<T>::abs in C++: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/column.h#L459-L465

which mainly consists of two steps:

  • Construct, compile and cache ExprSet: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/column.cpp#L208-L228 Essentially it generates a CallTypedExpr, which calls the function abs over field c0. c0 is again represented as FieldAccessTypedExpr

  • Wrap the input vector into a row vector, and evaluate with Velox: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/csrc/velox/column.cpp#L230-L244 See also Velox doc about expression eval in Velox.

Native Kernel Binding for cast

TorchArrow today contains a prototype implementation for cast, which essentially convert the data into Python object and performing the cast: https://github.com/facebookresearch/torcharrow/blob/7510a2d09f6d58c8b8b7493fe6d64925ff59b0ff/torcharrow/icolumn.py#L238-L253

We want to leverage native Velox kernel to do the cast. Essentially we want to have native cast method SimpleColumn<T>::cast_to_XXX(), similar to SimpleColumn<T>::abs(), binding it to Python and let NumericalColumnCpu.cast delegates to NumericalColumnCpu._data.castXXX.

Cast is not modelled as a function call in Velox. So instead of creating CallTypedExpr, we need to create CastTypedExpr this time. Here is an example that constructs CastTypedExpr in TorchArrow codebase.

Thoughts To Start With

  • We can start with native cast method with hard-coded type (e.g. SimpleColumn<T>::cast_to_int8), and have an end-to-end development experience, and later the refactor to cast method takes Type as input parameter.
  • We can start with only numerical casts by overriding NumericColumn.cast in Python

References:

  • Velox doc about Expression Evaluation: https://facebookincubator.github.io/velox/develop/expression-evaluation.html#expression-trees

wenleix avatar Jan 18 '22 19:01 wenleix

cc @scotts

wenleix avatar Jan 18 '22 19:01 wenleix