cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[FEA] Zero-copy nested types with other GPU libraries (like Awkward array)

Open shwina opened this issue 1 year ago • 10 comments

In a conversation with @martindurant and @jpivarski, it came up that there's no supported way to exchange data zero copy between cuDF and Awkward Array (which has GPU support).

The standard 0-copy mechanisms like dlpack and __cuda_array_interface__ don't support nested types like lists or structs. And our to/from_arrow() methods convert to and from host data so they're not useful when we want to 0-copy device data.

Option 1

We support a gpu=True (or similar) keyword argument in to_arrow() which would then return a PyArrow array backed by device data. Now, PyArrow does not seemingly support it, but it's possible to create a PyArrow array backed by device data:

In [5]: a = cp.asarray([1, 2, 3])

In [6]: buf = pa.foreign_buffer(a.data.ptr, a.nbytes, a)

In [7]: type(buf)
Out[7]: pyarrow.lib.Buffer

In [8]: print(buf)
<pyarrow.Buffer address=0x7f2f6fa00200 size=24 is_cpu=True is_mutable=False>

The problem (as can be seen above) is that PyArrow thinks this is a CPU-backed buffer. So attempting to do anything with it segfaults:

In [9]: arr = pa.Array.from_buffers(pa.int64(), len(a), buffers=[None, buf])

In [10]: print(arr)  # segfault

Option 2

We could expose new Series.to_buffers() and Series.from_buffers() functions that would produce and consume GPU buffers (along with a schema), presumably in the same order as arrow's from_buffers and buffers methods. We could use CuPy arrays to represent the buffers.


Curious what folks think? Interested also in @kkraus14's thoughts here if any.

shwina avatar Feb 02 '24 21:02 shwina

I think #14926 is pretty relevant here.

vyasr avatar Feb 02 '24 21:02 vyasr

I agree that this is kinda the exact use case that #14926 is designed for. Along with something like a PyCapsule based protocol.

kkraus14 avatar Feb 02 '24 21:02 kkraus14

I should add here that, from the Awkward Array side, any format that preserves all of the information is equally good. If given CuPy arrays (option 2), we might internally convert them to a format that follows a pyarrow array's Buffers so that we can reuse code that makes the adjustments between Arrow and Awkward, but that's our business.

I suggested option 1, making a pyarrow array that would segfault if you touch it, because this works for us (we'll be careful to not dereference the GPU pointers) and if pyarrow ever does add the infrastructure to interpret it correctly, the same interface on cuDF will work for both Awkward and Arrow.

jpivarski avatar Feb 03 '24 22:02 jpivarski

Ping on this, @shwina ; I gather work is ongoing in the linked issue, but I would appreciate a brief summary here of status and what we can expect for awkward integration.

martindurant avatar Feb 09 '24 16:02 martindurant

Thanks, @martindurant - I believe we should see a PR up for #14926 soon. At that point, we would be very grateful if you could provide feedback or perhaps do some early testing!

shwina avatar Feb 09 '24 20:02 shwina

Certainly, just let us know

martindurant avatar Feb 09 '24 20:02 martindurant

Just a note that #14926 will first yield the C++ level functions and C structs, and there would likely need to be a follow up in implementing the Python protocol around it. The issue tracking that work in Arrow is here: https://github.com/apache/arrow/issues/38325

kkraus14 avatar Feb 09 '24 21:02 kkraus14

Wouldn't nanoarrow provide a way to access the DeviceArray from Cython?

shwina avatar Feb 10 '24 00:02 shwina

Wouldn't nanoarrow provide a way to access the DeviceArray from Cython?

Would be very grateful if @paleolimbot could advise here!

shwina avatar Feb 16 '24 10:02 shwina

It's been touched on here, but I think the intention is ( https://github.com/apache/arrow/issues/38325 ) to add a protocol __arrow_c_device_array__() to mirror how __arrow_c_array__() works but with explicit non-CPU support ( https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html ).

When nanoarrow for Python has matured a bit it might be able to help export (and test), but the Cython needed to make the required Capsule is pretty compact and any library doing exporting should probably just copy it (or translate it to pybind11 or nanobind): https://github.com/apache/arrow-nanoarrow/blob/main/python/src/nanoarrow/_lib.pyx#L112-L127 .

paleolimbot avatar Feb 16 '24 14:02 paleolimbot

@martindurant just an update here that I'm waiting for #15047 to take some shape before I try and kick the tires with accessing from Python.

shwina avatar Feb 26 '24 20:02 shwina

@jpivarski , can you please link the experimental conversions code you wrote in awkward?

martindurant avatar Feb 26 '24 20:02 martindurant

This is the script that I used to test conversion of CuDF's Arrow data into Awkward. (The other direction should be even easier.)

https://github.com/scikit-hep/awkward/blob/main/studies/cudf-to-awkward.py

jpivarski avatar Feb 26 '24 20:02 jpivarski

Just wanted to provide a quick status update here. I've put together a prototype of the device data capsule protocols in #15370. It's not usable yet for a number of reasons, largely boiling down to the need for a D2D copy at the moment (although that may still be enough of an improvement over the current D2H2D that our existing to/from_arrow methods do that you'd still find it useful for testing), but we should be able to make some progress on that soon. I've started the discussion on how best to proceed here.

vyasr avatar Mar 21 '24 23:03 vyasr