MLServer icon indicating copy to clipboard operation
MLServer copied to clipboard

Discrepancy between models loaded by xgboost and lightgbm runtimes

Open ascillitoe opened this issue 2 years ago • 0 comments

Issue

The mlserver_xgboost runtime loads models as follows:

https://github.com/SeldonIO/MLServer/blob/6864a2ddf90bc3c81e9bf178b1baeba3931de28a/runtimes/xgboost/mlserver_xgboost/xgboost.py#L24-L34

Whilst the mlserver_lightgbm runtime does: https://github.com/SeldonIO/MLServer/blob/6864a2ddf90bc3c81e9bf178b1baeba3931de28a/runtimes/lightgbm/mlserver_lightgbm/lightgbm.py#L22

The result is that for xgboost we end up with a sklearn API model, whereas for lightgbm we end up with the raw Booster. The latter does not have a predict_proba method (for classifiers), hence infer_output='predict_proba is not supported.

Solutions

https://github.com/microsoft/LightGBM/issues/4841 suggests that lightgbm sklearn api models can (should?) be saved/loaded via joblib. We could add joblib.load support for when the model artefact's suffix is .joblib.

Alternatively, we could implement something similar to our xgboost implementation. However, converting a Booster to a sklearn api model involves accessing private attributes so might be brittle.

Related

See discussion here https://github.com/SeldonIO/MLServer/pull/1279#discussion_r1252798610. SKLearn api models (or models with predict_proba) are required for the white-box explainers such as TreeShap.

ascillitoe avatar Jul 06 '23 10:07 ascillitoe