cudf
cudf copied to clipboard
Migrate expressions to pylibcudf
Description
xref #15162
Migrates expresions to use pylibcudf.
Checklist
- [x] I am familiar with the Contributing Guidelines.
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.
Draft pending resolution of Vyas's comments.
@vyasr Let's put this in as is for now?
Ah, I realise I had the same high-level issue as @vyasr, it seems like you'd decided to do that separately?
Ah, I realise I had the same high-level issue as @vyasr, it seems like you'd decided to do that separately?
Yeah, I kinda thought about this for a bit, but it's a pretty involved fix. (but eventually, I would like this to take a pylibcudf Scalar)
Easy way out is: Just convert everything to a pyarrow scalar and call interop on it (this is what DeviceScalar does). This is probably not good for the goal of getting pyarrow out of the Cython.
I could also try to delete the datetime/timedelta handling code in this PR. (I don't think that gets used in cuDF Python)
The more correct way is to use the interchange mechanisms on the array object/scalar object we take in which I talked about with Vyas.
IMO, the best way to do this is to use DLPack, so we only have to support one thing. (Not sure if we need C++ changes here, though )
(but eventually, I would like this to take a pylibcudf Scalar)
Why is this one so hard? I think you just need to have a big switch statement in the constructor that takes the type-erased Scalar object and constructs (via casting) the correct numeric_scalar/timestamp_scalar/duration_scalar.
Effectively you would just replace the current switch statement that dispatches on numpy-like types with one that dispatches on scalar.type().id(), I think.
What am I missing?
I could also try to delete the datetime/timedelta handling code in this PR. (I don't think that gets used in cuDF Python)
This is kind of critical for predicate pushdown in cudf-polars.
(but eventually, I would like this to take a pylibcudf Scalar)
Why is this one so hard? I think you just need to have a big switch statement in the constructor that takes the type-erased
Scalarobject and constructs (via casting) the correctnumeric_scalar/timestamp_scalar/duration_scalar.
I meant creating the Scalar itself. The Scalar constructor has to accept a lot of possible inputs (e.g. Python scalars, Numpy scalars, Pyarrow scalars, potentially cupy scalars)
I think your proposed approach would work after we fixed the Scalar constructor.
(actually now that I think about it, maybe fixing this isn't a very immediate concern, I thought we'd need pyarrow in the Cython, but we can just force people to do plc.interop.from_arrow in their Python code outside the Cython)
EDIT: I will try the approach taking in a Scalar.
I think your proposed approach would work after we fixed the Scalar constructor.
Yeah, I think from the pylibcudf point of view in some sense, for the expression API, it is not (or should not be) the job of pylibcudf to help with the construction of a Scalar in the first place.
/merge