torcharrow icon indicating copy to clipboard operation
torcharrow copied to clipboard

High performance model preprocessing library on PyTorch

Results 68 torcharrow issues
Sort by recently updated
recently updated
newest added
trafficstars

We should remove the `_offset` and `_length` in `BaseColumn`: https://github.com/facebookresearch/torcharrow/blob/d680bfdc0f6a6bb6c3a29c2a67d62006782d6558/csrc/velox/column.h#L223-L224 There are multiple places where we do not properly track this, such as in expression evaluation: https://github.com/facebookresearch/torcharrow/blob/d680bfdc0f6a6bb6c3a29c2a67d62006782d6558/csrc/velox/column.cpp#L236-L238 We should be...

# Current Status In TorchArrow, the interface names are `ta.IDataFrame/ta.IColumn` while the factory methods are `ta.DataFrame`/`ta.Column`: ```python import torcharrow as ta a = ta.Column([1, 2, 3]) assert isinstance(a, ta.IColumn) assert...

release blocker

If users create a column from a Python list, we actually dispatch that directly to C++. For example, ``` vals = [1, 2, 3, 4, 5] col = ta.Column(vals, device="cpu")...

native kernel

Column construction from list is optimized with native C++ code (for scalar types), e.g. ```python import torcharrow as ta a = ta.Column([1, 2, 3]) ``` This optimization is not done...

# Native kernel binding for cast in CPU backend ## Background: TorchArrow Native Kernel Dispatch For efficiency, a lot of TorchArrow operations (e.g. `INumerialColumn.abs()`) is dispatched to the Velox C++...

native kernel

Motivation example (the actual dataset has two struct columns, with 13 and 26 fields respectively) : ```python dtype = dt.Struct( [ dt.Field("labels", dt.int8), dt.Field("dense_features", dt.Struct([dt.Field("int_1", dt.int32), dt.Field("int_2", dt.int32)])), ] )...

See https://github.com/facebookresearch/torcharrow/pull/100 for details Another wild idea is implement `to_pylist` at C++ `BaseColumn` level so a Python object is constructed recursively in C++ code.

IColumn.`sum`/`mean`/`std`/`median`/`quantile`/`mode`/`all`/`any`: https://github.com/facebookresearch/torcharrow/blob/380e1cbaf334b49d52242596c79627d456ef3b0d/torcharrow/icolumn.py#L1206-L1292 Also remove the Python implementation in cpu backend (if there is). Since once zero-copy interop with Arrow is implemented, it's more efficient to use Arrow Compute. Eventually we...

good first issue
native kernel

For single colulmn, delegating to Arrow Array seems to be a good initial support. Arrow array supports `fill_null/drop_null`. So we can first call `to_arrow`, then calls `fill_null/drop_null` in Arrow array,...

native kernel