onnxmltools icon indicating copy to clipboard operation
onnxmltools copied to clipboard

How to solve Feature name error while converting an XGBClassifier model to ONNX?

Open AnouarITI opened this issue 1 year ago • 4 comments

I trained an XGBClassifier model, and now I want to convert it to an ONNX format. It should be straightforward forward using this code:

import onnxmltools 
from skl2onnx.common.data_types import FloatTensorType

initial_types = [('float_input', FloatTensorType([None, X_train.shape[1]]))]

xgb_onnx = onnxmltools.convert_xgboost(xgb.xgb_category_cls, initial_types=initial_types)
onnxmltools.utils.save_model(xgb_onnx , 'xgb_onnx .onnx')

However, I get this error which is related to one of my features name:


     77                     feature_id = int(float(feature_id))
     78                 except ValueError:
---> 79                     raise RuntimeError(
     80                         "Unable to interpret '{0}', feature "
     81                         "names should follow pattern 'f%d'.".format(

RuntimeError: Unable to interpret 'state', feature names should follow pattern 'f%d'.

I am not sure what I did wrong.

AnouarITI avatar May 16 '23 13:05 AnouarITI

I came across the same issue. The converter expects no feature names or one that are "0", "1", ... or "f0", "f1", "f3".

You can work around this issue by renaming the features like this:

        booster = model.get_booster()
        original_feature_names = booster.feature_names
        if original_feature_names is not None:
            onnx_converter_conform_feature_names = [f"f{num}" for num in range(len(original_feature_names))]
            booster.feature_names = onnx_converter_conform_feature_names

But be careful, as you overwrite the original booster of the model, which means that from now on, the feature names of the xgboost model are now changed and using validate_features=True in the model.predict method with the original dataset may fail. If you throw away the model after the onnx conversion, you are fine. Otherwise, I would suggest doing a deep copy of the model, e.g. save + load.

ahallermed avatar May 30 '23 16:05 ahallermed

Do you have an example I could use to replicate the issue?

xadupre avatar Jul 28 '23 08:07 xadupre

Here is a minimal example with which I can reproduce this error (xgboost version 1.7.5):

# %%
from onnxmltools import convert_xgboost
from skl2onnx.common.data_types import FloatTensorType
from xgboost.sklearn import XGBClassifier
import pandas
import numpy as np

num_columns = 5
num_rows = 20
seed = 42
np.random.seed(seed)
X = np.random.random_sample((num_rows, num_columns))
y = np.random.randint(low=0, high=2, size=num_rows)
y = pandas.Series(y)
X = pandas.DataFrame(X)
columns = [f"abc_{num}" for num in range(num_columns)]
X.columns = columns

model = XGBClassifier(random_state=seed)
model = model.fit(X, y)
initial_type = [('float_input', FloatTensorType([None, 5]))]
convert_xgboost(model=model, initial_types=initial_type, target_opset=14)

ahallermed avatar Jul 28 '23 13:07 ahallermed

I had the same issue. I resolved it by renaming the features from f0-fn (where n is the number of features - 1). The problem occurs when a number is skipped (f0, f1, f2, f4), or the features names don't start from f0.

Do you have an example I could use to replicate the issue?

I had the same issue. I resolved it by renaming the features from f0-fn (where n is the number of features - 1). The problem occurs when a number is skipped (f0, f1, f2, f4), or the features names don't start from f0.

dmukuna avatar Apr 21 '24 17:04 dmukuna