onnxmltools icon indicating copy to clipboard operation
onnxmltools copied to clipboard

LightGBM categorical variables

Open onacrame opened this issue 5 years ago • 9 comments

Is there any indication as to when this might be supported as a fit parameter for LGBMClassifier and LGBMRegressor?

onacrame avatar Jun 04 '19 20:06 onacrame

Hello @MotoRZR! As of now, we don't have the resources to support LightGBM categorical variables in LGBMClassifier and LGBMRegression. We welcome community contributions!

vinitra-zz avatar Jun 10 '19 21:06 vinitra-zz

Without knowing that categorical features aren't yet supported, we ran into this issue while converting our model. Just leaving a comment that might help the implementation in the future.

The command was simply:

onnx_model = onnxmltools.convert_lightgbm(lgb_model, initial_types=inputs)

At the bottom of the stack trace:

~/.pyenv/versions/3.7.3/lib/python3.7/site-packages/onnx/helper.py in make_attribute(key, value, doc_string)
    267         else:
    268             raise ValueError(
--> 269                 "You passed in an iterable attribute but I cannot figure out "
    270                 "its applicable type.")
    271     else:

ValueError: You passed in an iterable attribute but I cannot figure out its applicable type.

The error message "You passed in an iterable attribute but I cannot figure out its applicable type." only partially explains the error because our model consists of both string and float so should be instances of TensorProto and byte arrays, both of which are supported.

But it blows up b/c it checks whether all values of the same type.

Although this one doesn't have anything to do with categorical features, I found a related issue: https://github.com/onnx/onnx/pull/1940

sheon-han-zocdoc avatar Oct 03 '19 14:10 sheon-han-zocdoc

@vinitra Just curious, is there any workaround to convert categorical features to ONNX format?

sheon-han-zocdoc avatar Oct 03 '19 14:10 sheon-han-zocdoc

@sheon-han-zocdoc To my best knowledge, I do not think there is a way to convert categorical features without getting hands dirty.

LightGBM model dump represent splits on categorical features as vertical-bar-separated strings consisting of category indices. In order to express this in ONNX, the node should take either the strings as-is or bit-encoded integers. My understanding is that ONNX does not support expressing such tree models yet.

Another way to do that is to map categorical features to some float numbers for each tree before feeding into tree operator. Well, that's quite a lot of features to generate.

hongzmsft avatar Oct 17 '19 17:10 hongzmsft

+1 for this feature. It would be great if onnxmltools supported mixed type input vectors for lightgbm

julioasotodv avatar Dec 02 '19 01:12 julioasotodv

If it's doable in PMML and Core ML it should be doable in ONNX

onacrame avatar Dec 06 '19 19:12 onacrame

Any update on this?

Boottexy avatar Jan 04 '21 11:01 Boottexy

Yes any update welcome!

mik3githubber avatar May 15 '22 21:05 mik3githubber