cudf
cudf copied to clipboard
Eagerly populate the class dict for cudf.pandas proxy types
Rather than dynamically looking up class attributes (and methods), this PR makes it so that we eagerly populate the class with all known methods and attributes (by inspecting the "slow" class).
This solves a number of problems:
- it makes
getattr
trivially inexpensive (no dynamic__getattr__
for each attribute access) - it ensures the same object is returned every time you do, e.g.,
DataFrame.max
- it makes tab completion fast because the attributes don't have to be computed each time
- it no longer exposes attributes that are specific to cuDF - for example
Series.list
- it allows subclassing of proxy types to work better. For example, derived types can now call
super().
to access attributes of base types
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
Generally looks good to me. Any impact on performance or the pass rate of the pandas test suite?
Generally looks good to me. Any impact on performance or the pass rate of the pandas test suite?
w.r.t benchmarks (and assuming we can unblock them) - would an improvement (or at least no degradation) in the time it takes to run the pandas unit tests be a good benchmark? (@bdice)
w.r.t benchmarks (and assuming we can unblock them) - would an improvement (or at least no degradation) in the time it takes to run the pandas unit tests be a good benchmark? (@bdice)
Yes, that's sufficient for me. Of course, also consider the pass rate. If it changes very much, the time could change as well.
/ok to test
/okay to test
/okay to test
/ok to test
@vyasr It looks like there are hangs, I'm currently debugging them locally. The run is currently taking more than 2 hours(ideally it has be completed ~30 mins). I'll push the changes to this PR once I have them ready.
Got it, thanks for the update @galipremsagar! I was clearing out my backlog and thought this PR might be ready.
/okay to test
/okay to test
/okay to test
/okay to test
/okay to test
/okay to test
/okay to test
/okay to test
/okay to test
/okay to test
/okay to test
/okay to test
/okay to test
Thanks for the PR! I had a quick look at the diff to see if I could find a non-regression test for the simple(?) use-case that triggered this: "tab completing in ipython". I think there isn't one (but the diff is quite big so maybe I missed it). I don't know how intricate the mechanism is that ipython uses to lookup things to suggest for tab completion, but maybe it is possible to add a test to make sure the bug doesn't come back in the future.
What do you think?
question: It seems like this branch is also pulling in a bunch of unrelated (?) changes in cudf-proper. Is that deliberate in the sense these are bug-fixes for issues that are exposed by these changes?
Yes, those are all the bug-fixes for the issues that are exposed by these changes, failures can be found here: https://github.com/rapidsai/cudf/actions/runs/8916717301/attempts/2#summary-24512387065
The initial set of changes caused 12% drop in pass-rate, with the current bug fixes we are down to 2% decrease (still WIP)
question: It seems like this branch is also pulling in a bunch of unrelated (?) changes in cudf-proper. Is that deliberate in the sense these are bug-fixes for issues that are exposed by these changes?
Yes, those are all the bug-fixes for the issues that are exposed by these changes, failures can be found here: https://github.com/rapidsai/cudf/actions/runs/8916717301/attempts/2#summary-24512387065
The initial set of changes caused 12% drop in pass-rate, with the current bug fixes we are down to 2% decrease (still WIP)
Aha, thanks!
/okay to test
/okay to test
/okay to test
/okay to test