Converting LightGBM Regressor to ONNX seems impossible with mixed float and string initital_types.
Hi,
I'm trying to convert a Light GBMRegressor using the convert_lightgbm function.
I am using a mix of categorical (string) and float values.
However, when I try to specify differerent initial_types, I get this error :
RuntimeError: For operator LgbmRegressor (type: LgbmRegressor), at most 1 input(s) is(are) supported but we got 15 input(s) which are ['postal_code_mission', 'do_code', 'dz_code', 'agency_code', 'adecco_code', 'siret', 'cod_prs_prc', 'cod_zep_ctr', 'cod_sgm_tt_con', 'depenses_client', 'month', 'hourly_rate', 'contract_duration', 'tension', 'difficulty']
I think it means I can only specify one input, so it has to be either float or string but can't be both ?
I tried doing :
initial_type = [('float_input', FloatTensorType([None, X_train.shape[1]]))]
# Convert the LightGBM model to ONNX format
onnx_model = onnxmltools.convert_lightgbm(model, initial_types=initial_type)
It worked for the conversion, but when in inference I got this error :
InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(string)) , expected: (tensor(float))
Wich makes sense since I'm passing string to something that expect floats ...
Am I doing something wrong or is there no way to convert a LGBMRegressor in onnx format with both string and float tensor ?
Any updates here?
Any updates?
Can you share more information about how you trained the model?
Closing the issue. Feel free to reopen it.
We're having the same issue, the _parse_lightgbm_simple_model doesn't handle the case where the inputs are derived from a pandas DataFrame (i.e. each column is a separate input variable, dtypes are heterogenous). So it just says "Okay, we have N inputs, one per column, LGTM," and then when the process gets to shape calculation, it dies because the shape calculator is only expecting a single input variable.
I'm not sure there's a good solution, other than the parser injecting some conversion logic that turns non-numeric categoricals into numerics, doing a concat, and then finally the tree ensemble. It would need to attach this categorical -> numeric mapping to the ensemble operator too, so the actual converter can use it to rewrite the trees.