LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

[python] Reapply the trained categorical columns when predicting

Open johnpaulett opened this issue 3 years ago • 2 comments

Fixes #5244. During prediction, force any columns that were categorical during training to dtype category again. Useful when hosted via kserve and the user is sending a HTTP JSON POST that will not natively get translated to a categorical column in the DataFrame.

Initially tried coding this change in _data_from_pandas, but elected to pull it into a separate method that is only called by predict(). I'm open to any feedback or suggestion on how to better implement this change.

johnpaulett avatar May 27 '22 18:05 johnpaulett

CLA assistant check
All CLA requirements met.

ghost avatar May 27 '22 18:05 ghost

This appears to not work when loading a saved model via model_file as the params are not read in. I'm looking at options.

@jameslamb pointed out the params issue is likely #2613 (#4802)

johnpaulett avatar May 29 '22 11:05 johnpaulett

Hi @johnpaulett. We've merged a PR that loads the parameters from the model file, so now you can access categorical_feature from params, i.e.

bst = lgb.Booster(model_file='model.txt')
bst.params['categorical_feature']

Please let us know if you want to continue with this.

jmoralez avatar Oct 11 '22 19:10 jmoralez

@jmoralez Wonderful -- let me look at rebasing and testing. I do think this would be valuable, as I currently maintain a fork of kserve's lgbserver docker image that side loads these features in.

johnpaulett avatar Oct 12 '22 21:10 johnpaulett

thanks! Please use merge commits instead of rebasing, though, for the reasons described in https://github.com/microsoft/LightGBM/pull/5252#issuecomment-1252839567.

jameslamb avatar Oct 12 '22 21:10 jameslamb

Hi!

I was wondering what the progress is on this PR and whether it's still on the roadmap? As I'm running into the exact problem @johnpaulett described in the first post. And I'm not sure what a different workaround would look like if I want to keep the category dtypes and not do some category-integer mapping.

spatiebalk avatar Feb 03 '23 09:02 spatiebalk