torcharrow icon indicating copy to clipboard operation
torcharrow copied to clipboard

Eliminate offset and length in BaseColumn

Open scotts opened this issue 3 years ago • 0 comments
trafficstars

We should remove the _offset and _length in BaseColumn: https://github.com/facebookresearch/torcharrow/blob/d680bfdc0f6a6bb6c3a29c2a67d62006782d6558/csrc/velox/column.h#L223-L224 There are multiple places where we do not properly track this, such as in expression evaluation: https://github.com/facebookresearch/torcharrow/blob/d680bfdc0f6a6bb6c3a29c2a67d62006782d6558/csrc/velox/column.cpp#L236-L238 We should be able to not track these in the BaseColumn anymore without losing any functionality.

We also may want to support UDF evaluation for different offsets, such as:

a = ta.Column([1, 2, 3])
b = ta.Column([10, 20, 30])

a[:2] + b[2:]

Slicing the vector with the BufferView might be the right solution.

cc: @wenleix

scotts avatar Feb 04 '22 23:02 scotts