cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Parse (non-MultiIndex) label-based keys to structured data

Open wence- opened this issue 2 years ago • 4 comments

Description

Following on from #13534, this extends the scheme to handle label-based lookups as long as the index is not a multiindex.

As is the case for positional indexing, all of the different ways one can index a frame with labels eventually boil down to indexing by slice, boolean mask, map, or scalar in libcudf. loc-based keys are parsed into information that tags them by type. Since this information is the same as is used for iloc indexing, we can then dispatch to the same "internal" calls that don't do further bounds-checking or normalisation: rather than converting a label-based lookup to an argument we can pass to iloc-getitem (which must reinterpret it), we just take the decision straight away.

The next stage (which will help to remove a bunch of code) will be to handle multiindex keys, but that will be sufficiently complicated that I'd rather do it separately.

  • Closes #13014
  • Closes #13268
  • Closes #13379
  • Closes #13652
  • Closes #13653
  • Closes #13658 (or at least, maintains the correctness)

Checklist

  • [x] I am familiar with the Contributing Guidelines.
  • [x] New or existing tests cover these changes.
  • [x] The documentation is up to date with these changes.

wence- avatar Jul 18 '23 15:07 wence-

Explicitly requested the same set of reviewers as #13534.

On the one hand, I would have liked to handle multiindex lookups as well in one change, but unfortunately I think the rules are too complicated and I'm still waiting for clarification on a bunch of ambiguities in the way pandas handles things. As a consequence this doesn't remove as much code as one would have hoped.

wence- avatar Jul 18 '23 15:07 wence-

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Sep 01 '23 11:09 copy-pr-bot[bot]

/ok to test

wence- avatar Sep 01 '23 13:09 wence-

@wence- did this stall out just waiting for reviews? If so, I'm very sorry I lost track of it. Please bring it up to date and I'm happy to review this ASAP (probably for 24.08 not 24.06 at this point though).

vyasr avatar May 22 '24 20:05 vyasr

@wence- did this stall out just waiting for reviews? If so, I'm very sorry I lost track of it. Please bring it up to date and I'm happy to review this ASAP (probably for 24.08 not 24.06 at this point though).

Sorry, it got rather struck on my end, I will try and resurrect for 24.10

wence- avatar Jul 31 '24 16:07 wence-

I am loathe to close this PR outright given how many issues that it fixes, but I also fear that it is woefully out of sync with the current state of the code base given how much @mroeschke has refactored the underlying code. Also, I doubt that @wence- will have time to return to this work any time soon given the various other major pushes that he is currently involved with.

@mroeschke and @wence- do you think it might make sense for there to be some knowledge transfer here (in the form of pair coding or simply discussion) to try and get this work over the finish line, perhaps in the form of multiple smaller and more easily digestible PRs?

vyasr avatar May 16 '25 22:05 vyasr

Happy to try and do that, yes

wence- avatar May 19 '25 11:05 wence-

Yeah I can try to digest @wence-'s change here and reimplement it amidst all the refactors. I can reach out to @wence- if I need any clarification.

mroeschke avatar May 19 '25 17:05 mroeschke

Yeah I can try to digest @wence-'s change here and reimplement it amidst all the refactors. I can reach out to @wence- if I need any clarification.

Note I also flamed out a bit on the multiindex stuff because I couldn't understand what pandas was doing (and at least when I was looking, was ambiguous in some cases).

See, for reference:

  • pandas-dev/pandas#53535
  • pandas-dev/pandas#53613
  • pandas-dev/pandas#53995
  • pandas-dev/pandas#53996
  • pandas-dev/pandas#54009
  • pandas-dev/pandas#54019

wence- avatar May 19 '25 17:05 wence-