onnxmltools
onnxmltools copied to clipboard
LightGBM categorical variables
Is there any indication as to when this might be supported as a fit parameter for LGBMClassifier and LGBMRegressor?
Hello @MotoRZR! As of now, we don't have the resources to support LightGBM categorical variables in LGBMClassifier and LGBMRegression. We welcome community contributions!
Without knowing that categorical features aren't yet supported, we ran into this issue while converting our model. Just leaving a comment that might help the implementation in the future.
The command was simply:
onnx_model = onnxmltools.convert_lightgbm(lgb_model, initial_types=inputs)
At the bottom of the stack trace:
~/.pyenv/versions/3.7.3/lib/python3.7/site-packages/onnx/helper.py in make_attribute(key, value, doc_string)
267 else:
268 raise ValueError(
--> 269 "You passed in an iterable attribute but I cannot figure out "
270 "its applicable type.")
271 else:
ValueError: You passed in an iterable attribute but I cannot figure out its applicable type.
The error message "You passed in an iterable attribute but I cannot figure out its applicable type."
only partially explains the error because our model consists of both string and float so should be instances of TensorProto
and byte arrays, both of which are supported.
But it blows up b/c it checks whether all values of the same type.
Although this one doesn't have anything to do with categorical features, I found a related issue: https://github.com/onnx/onnx/pull/1940
@vinitra Just curious, is there any workaround to convert categorical features to ONNX format?
@sheon-han-zocdoc To my best knowledge, I do not think there is a way to convert categorical features without getting hands dirty.
LightGBM model dump represent splits on categorical features as vertical-bar-separated strings consisting of category indices. In order to express this in ONNX, the node should take either the strings as-is or bit-encoded integers. My understanding is that ONNX does not support expressing such tree models yet.
Another way to do that is to map categorical features to some float numbers for each tree before feeding into tree operator. Well, that's quite a lot of features to generate.
+1 for this feature. It would be great if onnxmltools supported mixed type input vectors for lightgbm
If it's doable in PMML and Core ML it should be doable in ONNX
Any update on this?
Yes any update welcome!