torcharrow icon indicating copy to clipboard operation
torcharrow copied to clipboard

High performance model preprocessing library on PyTorch

Results 68 torcharrow issues
Sort by recently updated
recently updated
newest added
trafficstars

It's convention to have submodules in root level `thrid_party` directory (e.g. consider `PyTorch` or `TorchAudio`). For historic reason TorchArrow putting it at `csrc/velox`: https://github.com/facebookresearch/torcharrow/tree/main/csrc/velox

``` [670/694] Linking CXX shared module csrc/velox/_torcharrow.cpython-38-darwin.so ld: warning: direct access in function 'facebook::torcharrow::declareArrayType(pybind11::module_&)' from file 'csrc/velox/CMakeFiles/_torcharrow.dir/lib.cpp.o' to global weak symbol 'facebook::velox::ArrayType::elementType() const' from file 'csrc/velox/velox/velox/functions/prestosql/CMakeFiles/velox_functions_prestosql.dir/ArrayContains.cpp.o' means the weak symbol...

This is an automated pull request to update the first-party submodule for [facebookincubator/velox](https://github.com/facebookincubator/velox). New submodule commit: https://github.com/facebookincubator/velox/commit/abe7604bcd66f4fa96c5d21a643d0261efe07a8c Test Plan: Ensure that CI jobs succeed on GitHub before landing.

CLA Signed

Hi, I noticed that some data preprocessing operations used in recommendation systems like `bucketize, sigridHash, and firstX` are implemented in: [torcharrow/tree/main/csrc/velox/functions/rec](https://github.com/pytorch/torcharrow/tree/main/csrc/velox/functions/rec) I would like to ask if other preprocessing operations...

Hi, This looks like a really interesting project! I saw currently torcharrow runs on CPU with the Velox backend. Just wondering any plan to offload some of the ops to...

Summary: Tests and benchmarks targets are now de-coupled. That means they can be built independently. Shared functionality is moved to a common utility library. Resolves https://github.com/facebookincubator/velox/issues/1704 X-link: https://github.com/facebookincubator/velox/pull/2439 Reviewed By:...

CLA Signed
fb-exported

I`m asking for myself, and also my algo team members in company. Currently we got PB level of data, which is separated in parquets across different remote hdfs paths (per...

`save-state` and `set-output` commands used in GitHub Actions are deprecated and [GitHub recommends using environment files](https://github.blog/changelog/2023-07-24-github-actions-update-on-save-state-and-set-output-commands/). This PR updates the usage of `set-output` to `$GITHUB_OUTPUT` Instructions for envvar usage from...

CLA Signed

### 🚀 The feature, motivation and pitch We're working on supporting bf16 in [lance format](https://github.com/lancedb/lance), which will be presented as a bf16 extension type in Arrow (see PR for details:...

I want to use datapipe to read parquet files in which image is stored as binary. But I got error: ``` NotImplementedError: Unsupported Arrow type: binary ``` So I wonder...