cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Eagerly populate the class dict for cudf.pandas proxy types

Open shwina opened this issue 1 year ago • 27 comments

Rather than dynamically looking up class attributes (and methods), this PR makes it so that we eagerly populate the class with all known methods and attributes (by inspecting the "slow" class).

This solves a number of problems:

  • it makes getattr trivially inexpensive (no dynamic __getattr__ for each attribute access)
  • it ensures the same object is returned every time you do, e.g., DataFrame.max
  • it makes tab completion fast because the attributes don't have to be computed each time
  • it no longer exposes attributes that are specific to cuDF - for example Series.list
  • it allows subclassing of proxy types to work better. For example, derived types can now call super(). to access attributes of base types

shwina avatar Nov 30 '23 13:11 shwina

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Feb 22 '24 19:02 copy-pr-bot[bot]

Generally looks good to me. Any impact on performance or the pass rate of the pandas test suite?

bdice avatar Mar 12 '24 14:03 bdice

Generally looks good to me. Any impact on performance or the pass rate of the pandas test suite?

w.r.t benchmarks (and assuming we can unblock them) - would an improvement (or at least no degradation) in the time it takes to run the pandas unit tests be a good benchmark? (@bdice)

shwina avatar Mar 12 '24 20:03 shwina

w.r.t benchmarks (and assuming we can unblock them) - would an improvement (or at least no degradation) in the time it takes to run the pandas unit tests be a good benchmark? (@bdice)

Yes, that's sufficient for me. Of course, also consider the pass rate. If it changes very much, the time could change as well.

bdice avatar Mar 13 '24 13:03 bdice

/ok to test

vyasr avatar Apr 11 '24 20:04 vyasr

/okay to test

galipremsagar avatar Apr 12 '24 00:04 galipremsagar

/okay to test

galipremsagar avatar Apr 12 '24 14:04 galipremsagar

/ok to test

vyasr avatar Apr 16 '24 20:04 vyasr

@vyasr It looks like there are hangs, I'm currently debugging them locally. The run is currently taking more than 2 hours(ideally it has be completed ~30 mins). I'll push the changes to this PR once I have them ready.

galipremsagar avatar Apr 16 '24 20:04 galipremsagar

Got it, thanks for the update @galipremsagar! I was clearing out my backlog and thought this PR might be ready.

vyasr avatar Apr 16 '24 21:04 vyasr

/okay to test

galipremsagar avatar Apr 17 '24 14:04 galipremsagar

/okay to test

galipremsagar avatar Apr 17 '24 15:04 galipremsagar

/okay to test

galipremsagar avatar Apr 17 '24 20:04 galipremsagar

/okay to test

galipremsagar avatar Apr 19 '24 21:04 galipremsagar

/okay to test

galipremsagar avatar Apr 20 '24 02:04 galipremsagar

/okay to test

galipremsagar avatar Apr 23 '24 18:04 galipremsagar

/okay to test

galipremsagar avatar Apr 24 '24 20:04 galipremsagar

/okay to test

galipremsagar avatar Apr 26 '24 00:04 galipremsagar

/okay to test

galipremsagar avatar Apr 26 '24 16:04 galipremsagar

/okay to test

galipremsagar avatar Apr 29 '24 22:04 galipremsagar

/okay to test

galipremsagar avatar Apr 30 '24 03:04 galipremsagar

/okay to test

galipremsagar avatar Apr 30 '24 20:04 galipremsagar

/okay to test

galipremsagar avatar May 02 '24 00:05 galipremsagar

Thanks for the PR! I had a quick look at the diff to see if I could find a non-regression test for the simple(?) use-case that triggered this: "tab completing in ipython". I think there isn't one (but the diff is quite big so maybe I missed it). I don't know how intricate the mechanism is that ipython uses to lookup things to suggest for tab completion, but maybe it is possible to add a test to make sure the bug doesn't come back in the future.

What do you think?

betatim avatar May 02 '24 09:05 betatim

question: It seems like this branch is also pulling in a bunch of unrelated (?) changes in cudf-proper. Is that deliberate in the sense these are bug-fixes for issues that are exposed by these changes?

Yes, those are all the bug-fixes for the issues that are exposed by these changes, failures can be found here: https://github.com/rapidsai/cudf/actions/runs/8916717301/attempts/2#summary-24512387065

The initial set of changes caused 12% drop in pass-rate, with the current bug fixes we are down to 2% decrease (still WIP)

galipremsagar avatar May 02 '24 16:05 galipremsagar

question: It seems like this branch is also pulling in a bunch of unrelated (?) changes in cudf-proper. Is that deliberate in the sense these are bug-fixes for issues that are exposed by these changes?

Yes, those are all the bug-fixes for the issues that are exposed by these changes, failures can be found here: https://github.com/rapidsai/cudf/actions/runs/8916717301/attempts/2#summary-24512387065

The initial set of changes caused 12% drop in pass-rate, with the current bug fixes we are down to 2% decrease (still WIP)

Aha, thanks!

wence- avatar May 02 '24 17:05 wence-

/okay to test

galipremsagar avatar May 03 '24 14:05 galipremsagar

/okay to test

galipremsagar avatar May 16 '24 14:05 galipremsagar

/okay to test

galipremsagar avatar May 16 '24 17:05 galipremsagar

/okay to test

galipremsagar avatar May 16 '24 21:05 galipremsagar