cudf Eagerly populate the class dict for cudf.pandas proxy types

Rather than dynamically looking up class attributes (and methods), this PR makes it so that we eagerly populate the class with all known methods and attributes (by inspecting the "slow" class).

This solves a number of problems:

it makes getattr trivially inexpensive (no dynamic __getattr__ for each attribute access)
it ensures the same object is returned every time you do, e.g., DataFrame.max
it makes tab completion fast because the attributes don't have to be computed each time
it no longer exposes attributes that are specific to cuDF - for example Series.list
it allows subclassing of proxy types to work better. For example, derived types can now call super(). to access attributes of base types

Nov 30 '23 13:11 shwina

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Feb 22 '24 19:02 copy-pr-bot[bot]

Generally looks good to me. Any impact on performance or the pass rate of the pandas test suite?

Mar 12 '24 14:03 bdice

Generally looks good to me. Any impact on performance or the pass rate of the pandas test suite?

w.r.t benchmarks (and assuming we can unblock them) - would an improvement (or at least no degradation) in the time it takes to run the pandas unit tests be a good benchmark? (@bdice)

Mar 12 '24 20:03 shwina

w.r.t benchmarks (and assuming we can unblock them) - would an improvement (or at least no degradation) in the time it takes to run the pandas unit tests be a good benchmark? (@bdice)

Yes, that's sufficient for me. Of course, also consider the pass rate. If it changes very much, the time could change as well.

Mar 13 '24 13:03 bdice

/ok to test

Apr 11 '24 20:04 vyasr

/okay to test

Apr 12 '24 00:04 galipremsagar

/okay to test

Apr 12 '24 14:04 galipremsagar

/ok to test

Apr 16 '24 20:04 vyasr

@vyasr It looks like there are hangs, I'm currently debugging them locally. The run is currently taking more than 2 hours(ideally it has be completed ~30 mins). I'll push the changes to this PR once I have them ready.

Apr 16 '24 20:04 galipremsagar

Got it, thanks for the update @galipremsagar! I was clearing out my backlog and thought this PR might be ready.

Apr 16 '24 21:04 vyasr

/okay to test

Apr 17 '24 14:04 galipremsagar

/okay to test

Apr 17 '24 15:04 galipremsagar

/okay to test

Apr 17 '24 20:04 galipremsagar

/okay to test

Apr 19 '24 21:04 galipremsagar

/okay to test

Apr 20 '24 02:04 galipremsagar

/okay to test

Apr 23 '24 18:04 galipremsagar

/okay to test

Apr 24 '24 20:04 galipremsagar

/okay to test

Apr 26 '24 00:04 galipremsagar

/okay to test

Apr 26 '24 16:04 galipremsagar

/okay to test

Apr 29 '24 22:04 galipremsagar

/okay to test

Apr 30 '24 03:04 galipremsagar

/okay to test

Apr 30 '24 20:04 galipremsagar

/okay to test

May 02 '24 00:05 galipremsagar

Thanks for the PR! I had a quick look at the diff to see if I could find a non-regression test for the simple(?) use-case that triggered this: "tab completing in ipython". I think there isn't one (but the diff is quite big so maybe I missed it). I don't know how intricate the mechanism is that ipython uses to lookup things to suggest for tab completion, but maybe it is possible to add a test to make sure the bug doesn't come back in the future.

What do you think?

May 02 '24 09:05 betatim

question: It seems like this branch is also pulling in a bunch of unrelated (?) changes in cudf-proper. Is that deliberate in the sense these are bug-fixes for issues that are exposed by these changes?

Yes, those are all the bug-fixes for the issues that are exposed by these changes, failures can be found here: https://github.com/rapidsai/cudf/actions/runs/8916717301/attempts/2#summary-24512387065

The initial set of changes caused 12% drop in pass-rate, with the current bug fixes we are down to 2% decrease (still WIP)

May 02 '24 16:05 galipremsagar

question: It seems like this branch is also pulling in a bunch of unrelated (?) changes in cudf-proper. Is that deliberate in the sense these are bug-fixes for issues that are exposed by these changes?

Yes, those are all the bug-fixes for the issues that are exposed by these changes, failures can be found here: https://github.com/rapidsai/cudf/actions/runs/8916717301/attempts/2#summary-24512387065

The initial set of changes caused 12% drop in pass-rate, with the current bug fixes we are down to 2% decrease (still WIP)

Aha, thanks!

May 02 '24 17:05 wence-

/okay to test

May 03 '24 14:05 galipremsagar

/okay to test

May 16 '24 14:05 galipremsagar

/okay to test

May 16 '24 17:05 galipremsagar

/okay to test

May 16 '24 21:05 galipremsagar

cudf cudf copied to clipboard

Eagerly populate the class dict for cudf.pandas proxy types

cudf
cudf copied to clipboard