treelite icon indicating copy to clipboard operation
treelite copied to clipboard

Treelite gives different predictions than base XGBoost model

Open juliuscoburger opened this issue 5 months ago • 0 comments

I noticed that my model returns different scores than the original model. I was able to boil the issue down to using a base_score during training. Can it be that this value is not being translated?

Code to replicate the issue:

import numpy as np
import xgboost as xgb
import treelite

np.random.seed(42)
N = 10
X = np.random.random((N, 10))
y = np.random.random((N,))
dtrain = xgb.DMatrix(X, label=y)
bst = xgb.train({
    'objective': 'count:poisson'
}, dtrain, 10)
bst.save_model('/tmp/bst.json')
tl_model = treelite.frontend.load_xgboost_model('/tmp/bst.json')
# Treelite gives the same predictions as xgboost
np.testing.assert_almost_equal(treelite.gtil.predict(tl_model, data=X).squeeze(), bst.predict(dtrain))


# Poisson will fail for sufficiently high predictions, see https://github.com/dmlc/xgboost/issues/10486
y = np.random.random((N,)) * 3000
dtrain = xgb.DMatrix(X, label=y)
# But the issue can be mitigated by setting sufficiently high base score
bst = xgb.train({
    'objective': 'count:poisson',
    'base_score': 3000
}, dtrain, 10)
bst.save_model('/tmp/bst.json')

tl_model = treelite.frontend.load_xgboost_model('/tmp/bst.json')
# Unfortunatelly treelite now gives different predictions
np.testing.assert_almost_equal(treelite.gtil.predict(tl_model, data=X).squeeze(), bst.predict(dtrain))

juliuscoburger avatar Sep 23 '24 08:09 juliuscoburger