cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[Story] Remove pyarrow dependency from cudf-polars

Open vyasr opened this issue 8 months ago • 5 comments

It should be possible to use cudf-polars without installing pyarrow. We make very limited use of it right now, primarily for our host-device data transfers. Now that we fully support the Arrow interchange protocol, we should be able to leverage that directly and remove the library dependency. This depends on #17046 as well as #17054.

vyasr avatar Apr 18 '25 23:04 vyasr

One current issue with the interchange protocol approach is that polars sometimes creates large list columns instead of normal list columns. We either need to add a casting layer internally in libcudf's implementation (we're not going to support large lists any time soon) or we need a way to cast at the Python layer. It doesn't seem like Polars exposes enough knobs for that yet though. The schema exports do not respect the requested_schema parameter, and the List type does not distinguish whether it is using 32- or 64-bit offsets under the hood.

vyasr avatar Apr 18 '25 23:04 vyasr

The large list case is resolved by conversion in #18562.

vyasr avatar May 02 '25 03:05 vyasr

Also probably requires #18722

vyasr avatar May 20 '25 04:05 vyasr

After https://github.com/rapidsai/cudf/pull/18564, there will be no direct imports of pyarrow (except tests)

But cudf_polars will still depend on pylibcudf.interop.to_arrow to use pyarrow for:

  • Converting pylibcudf.Scalar to eventually a Python scalar: https://github.com/rapidsai/cudf/issues/18921
  • Converting a pylibcudf.Table to eventually a polars DataFrame

mroeschke avatar May 29 '25 19:05 mroeschke

For the Table conversion we should be able to use the versions of the function that operate on just the capsules, right? Once cudf-polars no longer imports pyarrow directly the next step is making all pyarrow usage inside pylibcudf optional and then seeing what we have to do to not access those code paths in cudf-polars.

vyasr avatar Jun 11 '25 01:06 vyasr