[Story] Remove pyarrow dependency from cudf-polars
It should be possible to use cudf-polars without installing pyarrow. We make very limited use of it right now, primarily for our host-device data transfers. Now that we fully support the Arrow interchange protocol, we should be able to leverage that directly and remove the library dependency. This depends on #17046 as well as #17054.
One current issue with the interchange protocol approach is that polars sometimes creates large list columns instead of normal list columns. We either need to add a casting layer internally in libcudf's implementation (we're not going to support large lists any time soon) or we need a way to cast at the Python layer. It doesn't seem like Polars exposes enough knobs for that yet though. The schema exports do not respect the requested_schema parameter, and the List type does not distinguish whether it is using 32- or 64-bit offsets under the hood.
The large list case is resolved by conversion in #18562.
Also probably requires #18722
After https://github.com/rapidsai/cudf/pull/18564, there will be no direct imports of pyarrow (except tests)
But cudf_polars will still depend on pylibcudf.interop.to_arrow to use pyarrow for:
- Converting
pylibcudf.Scalarto eventually a Python scalar: https://github.com/rapidsai/cudf/issues/18921 - Converting a
pylibcudf.Tableto eventually a polars DataFrame
For the Table conversion we should be able to use the versions of the function that operate on just the capsules, right? Once cudf-polars no longer imports pyarrow directly the next step is making all pyarrow usage inside pylibcudf optional and then seeing what we have to do to not access those code paths in cudf-polars.