LightGBM
LightGBM copied to clipboard
[python] Reapply the trained categorical columns when predicting
Fixes #5244. During prediction, force any columns that were categorical during training to dtype category again. Useful when hosted via kserve and the user is sending a HTTP JSON POST that will not natively get translated to a categorical column in the DataFrame.
Initially tried coding this change in _data_from_pandas, but elected to pull it into a separate method that is only called by predict(). I'm open to any feedback or suggestion on how to better implement this change.
This appears to not work when loading a saved model via model_file as the params are not read in. I'm looking at options.
@jameslamb pointed out the params issue is likely #2613 (#4802)
Hi @johnpaulett. We've merged a PR that loads the parameters from the model file, so now you can access categorical_feature from params, i.e.
bst = lgb.Booster(model_file='model.txt')
bst.params['categorical_feature']
Please let us know if you want to continue with this.
@jmoralez Wonderful -- let me look at rebasing and testing. I do think this would be valuable, as I currently maintain a fork of kserve's lgbserver docker image that side loads these features in.
thanks! Please use merge commits instead of rebasing, though, for the reasons described in https://github.com/microsoft/LightGBM/pull/5252#issuecomment-1252839567.
Hi!
I was wondering what the progress is on this PR and whether it's still on the roadmap? As I'm running into the exact problem @johnpaulett described in the first post. And I'm not sure what a different workaround would look like if I want to keep the category dtypes and not do some category-integer mapping.