csp icon indicating copy to clipboard operation
csp copied to clipboard

Move Arrow code to PyCapsule / C API

Open timkpaine opened this issue 1 year ago • 3 comments

We rely on an unstable C++ ABI for pyarrow (https://github.com/Point72/csp/tree/main/cpp/csp/python/adapters/vendored/) for historical reasons. This does not work in all circumstances (e.g. when we built mac wheels with gcc, we were incompatible with pypi-provided pyarrow which is compiled with clang), but we mostly get away with it.

We should move to the PyCapsule / C API.

Here is a useful example (albeit incomplete):

  • https://github.com/timkpaine/arrow-cpp-python-nocopy/blob/main/src/apn-python/cpython.h
  • https://github.com/timkpaine/arrow-cpp-python-nocopy/blob/main/src/apn-python/cpython.cpp
  • https://github.com/timkpaine/arrow-cpp-python-nocopy/blob/main/src/apn-python/common.cpp

timkpaine avatar May 03 '24 14:05 timkpaine

Updates: We no longer rely on an unstable C++ ABI, but we do incur a full copy here: https://github.com/Point72/csp/blob/dc7426c08eaee22713b3ce70b99d5bf14dc801df/csp/adapters/parquet.py#L119

timkpaine avatar Mar 30 '25 16:03 timkpaine

Updates: We no longer rely on an unstable C++ ABI, but we do incur a full copy here:

csp/csp/adapters/parquet.py

Line 119 in dc7426c

def _arrow_in_memory_table_to_buffers(cls, gen, startime, endtime):

We should create 2 different issues:

  1. For removing vendored code
  2. For using the pycapsule API (to avoid the full copy)

arhamchopra avatar Mar 30 '25 20:03 arhamchopra

We should create 2 different issues:

  1. For removing vendored code
  2. For using the pycapsule API (to avoid the full copy)

This was already done, so this issue is just (2)

timkpaine avatar Mar 30 '25 23:03 timkpaine