ray icon indicating copy to clipboard operation
ray copied to clipboard

[AIR] Maintain dtype info in LightGBMPredictor

Open Yard1 opened this issue 3 years ago • 0 comments

Signed-off-by: Antoni Baum [email protected]

Why are these changes needed?

We always convert to numpy and then back to dataframe in LightGBMPredictor, and try to infer dtypes in between. This is imprecise and allows for an edge case where a Categorical column composed of integers is classified as an int column, and it also decreases performance. This PR keeps dtype information if possible by not converting to numpy unnecessarily. The inference logic is still present for the tensor column case - I am not familiar enough with it to fix it here (if it needs fixing in the first place).

Related issue number

Closes https://github.com/ray-project/ray/issues/28619

Checks

  • [x] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [x] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
  • [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [x] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

Yard1 avatar Sep 21 '22 17:09 Yard1