onnxmltools
onnxmltools copied to clipboard
xgboost prediction doesn't match
Hi,
I ran into an issue with xgboost model conversion and want to post the symptom here. It happened when I attempted to convert a model trained on features stored as pandas DataFrame. The code below reproduces the issue.
import numpy as np
from onnxmltools.convert import convert_xgboost
from onnxmltools.convert.common.data_types import FloatTensorType
import onnxruntime as rt
import pandas as pd
from xgboost import XGBClassifier
def test_convert_xgboost(columns):
num_features = len(columns)
nrows = 50
nrows_test = 10
model_path = 'test_xgb.onnx'
# prepare data
X = np.random.random((nrows, num_features))
X = pd.DataFrame(X, columns=columns)
X_test = np.random.random((nrows_test, num_features))
y = np.zeros(nrows)
y[:nrows//2] = 1
# train
xgb = XGBClassifier(use_label_encoder=False)
xgb.fit(X, y)
# convert to onnx
onx = convert_xgboost(xgb,
initial_types=[('feature_input',
FloatTensorType([None, num_features]))])
with open(model_path, 'wb') as f:
f.write(onx.SerializeToString())
# test predictions
sess = rt.InferenceSession(model_path)
input_name = sess.get_inputs()[0].name
prob_name = sess.get_outputs()[1].name
pred_onx = sess.run([prob_name], {input_name: X_test.astype(np.float32)})[0]
pred = xgb.predict_proba(X_test)
assert np.allclose(pred_onx, pred)
test_convert_xgboost(['f0','f1','f2','f3','f4']) # pass
test_convert_xgboost(['f1','f2','f3','f4','f5']) # fail because f0 is skipped
test_convert_xgboost(['f0','f1','f2','f4','f3']) # fail because f3 and f4 are swapped
At a glance, the culprit seems the way convert_xgboost
(or more specifically XGBConverter
class) translates column names to feature ids. I can't imagine anyone jumbling up columns as the above examples. However, if anyone inadvertently does, this issue could be cumbersome to debug because it doesn't raise any exceptions (as a consequence, this could be easily misunderstood as issues with general model performance).
environment macOS 10.14.6 Python 3.8.6 onnxmltools 1.7.0 onnxruntime 1.6.0 xgboost 1.3.1 pandas 1.2.0 numpy 1.18.5