torcharrow
torcharrow copied to clipboard
High performance model preprocessing library on PyTorch
Hi guys, Since torcharrow uses Arrow in-memory format to achieve zero-copy for external readers, I assume using acero is more intuitive, and the conversion from velox format to arrow format...
Recently I need to use pyarrow to process my dataset, and I wonder the difference between torcharrow and Pyarrow, does TorchArrow faster than PyArrow?Will you provide a performance comparision result?
Datafusion [https://github.com/apache/arrow-datafusion](https://github.com/apache/arrow-datafusion) is the subproject of apache arrow I can do `ta.from_arrow(pa.Table.from_batches( datafusion_dataframe.collect())` for now to 1. convert datafusion's dataframe to arrow batch record 2. convert arrow batch record to...
## Use Case In feature pre-processing we often needs to cast feature ID from int64 to int32. The cast often needs to recursively done for complex types, such as *...
Differential Revision: D37125908
## Description - This PR enables building TorchArrow with PyTorch in GitHub CI. This also allows us to test the functionality of some of the torchtext operators that were added...
Summary: Enable type inference functions to automatically deduce the row types when a column is built from a list of dataframe objects without supplying the "dtype" parameter. REMARKS: 1. The...
Following the instructions in the README to build and test torcharrow leads to an illegal hardware instruction error on OS X 10.15.7. I'm running on 2.9 GHz Dual-Core Intel Core...
Summary: quantile arg should be [0, 1] Next step I'll fix the quantile to be more performant Reviewed By: vancexu Differential Revision: D36639711