sklearn-onnx icon indicating copy to clipboard operation
sklearn-onnx copied to clipboard

ValueError: Unable to create node 'TreeEnsembleClassifier' with name='WrappedLightGbmBoosterClassifier'.

Open Bhuvanamitra opened this issue 2 years ago • 10 comments

Hi, I am trying to convert LightGBM Binary Classification model into ONNX.

I have defined calculate_lightgbm_output_shapes, lightgbm_parser, WrappedLightGbmBoosterClassifier functions and used update_registered_converter( WrappedLightGbmBoosterClassifier, 'WrappedLightGbmBoosterClassifier', calculate_lightgbm_output_shapes, convert_lightgbm, parser=lightgbm_parser, options={'zipmap': [False, True], 'nocl': [False, True]}) to update registered converter.

To convert the model, I used onnx_model = to_onnx(lgm_model, initial_types=[('feature_input', FloatTensorType([None, 2021]))], options={WrappedLightGbmBoosterClassifier: {'zipmap': False}}, target_opset={'': 15, 'ai.onnx.ml': 2})

But I am getting the following error ValueError: Unable to create node 'TreeEnsembleClassifier' with name='WrappedLightGbmBoosterClassifier'.

This is the error trace : `File ~/jupyter_dir/jupyter_env/lib/python3.8/site-packages/onnxmltools/convert/lightgbm/operator_converters/LightGbm.py:519, in convert_lightgbm(scope, operator, container) 515 probability_tensor_name = scope.get_unique_variable_name( 516 'probability_tensor') 517 label_tensor_name = scope.get_unique_variable_name('label_tensor') --> 519 container.add_node( 520 'TreeEnsembleClassifier', operator.input_full_names, 521 [label_tensor_name, probability_tensor_name], 522 op_domain='ai.onnx.ml', **attrs) 524 prob_tensor = probability_tensor_name 526 if gbm_model.boosting_type == 'rf':

File ~/jupyter_dir/jupyter_env/lib/python3.8/site-packages/skl2onnx/common/_container.py:644, in ModelComponentContainer.add_node(self, op_type, inputs, outputs, op_domain, op_version, name, **attrs) 641 node = make_node(op_type, inputs, outputs, name=name, 642 _dtype=dtype, **attrs) 643 except ValueError as e: --> 644 raise ValueError("Unable to create node '{}' with name='{}'." 645 "".format(op_type, name)) from e 646 node.domain = op_domain 648 self.node_domain_version_pair_sets.add((op_domain, op_version))

ValueError: Unable to create node 'TreeEnsembleClassifier' with name='WrappedLightGbmBoosterClassifier'.`

I am unable to figure out why this is happening. Please help to resolve this. Thanks in advance.

Bhuvanamitra avatar Oct 05 '22 11:10 Bhuvanamitra

Could you use this example as a reference https://onnx.ai/sklearn-onnx/auto_tutorial/plot_gexternal_lightgbm.html ?

xadupre avatar Oct 07 '22 14:10 xadupre

Hi, when i try that method, I get the following error:

MissingShapeCalculator: Unable to find a shape calculator for type '<class 'lightgbm.basic.Booster'>'. It usually means the pipeline being converted contains a transformer or a predictor with no corresponding converter implemented in sklearn-onnx. If the converted is implemented in another library, you need to register the converted so that it can be used by sklearn-onnx (function update_registered_converter). If the model is not yet covered by sklearn-onnx, you may raise an issue to https://github.com/onnx/sklearn-onnx/issues to get the converter implemented or even contribute to the project. If the model is a custom model, a new converter must be implemented. Examples can be found in the gallery.

Bhuvanamitra avatar Oct 10 '22 07:10 Bhuvanamitra

This case is not exposed as an example but is tested in a unit test: https://github.com/onnx/sklearn-onnx/blob/main/tests_onnxmltools/test_lightgbm.py. You should find the missing pieces in that file.

xadupre avatar Oct 10 '22 09:10 xadupre

Hi, I have tried to execute the same code that is mentioned in above link. I get the same error as mentioned in the subject of this issue. I have attached the python notebook and error trace in following repo. Please let me know how to resolve this. https://github.com/Bhuvanamitra/LightGBMToONNX/tree/main

Bhuvanamitra avatar Oct 11 '22 06:10 Bhuvanamitra

I tried the following example and it worked. This issue comes from a feature your tree is using and which is not supported by the converter. Given the error you mention, it might be caused by an unexpected rule in a node (usually only < is used).

import pickle
import numpy
import pandas as pd
from onnxruntime import InferenceSession
import onnxruntime as rt
from sklearn.base import ClassifierMixin
from lightgbm import LGBMClassifier, LGBMRegressor, Dataset, train, Booster
from skl2onnx import update_registered_converter
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.shape_calculator import (
    calculate_linear_classifier_output_shapes,  # noqa
    calculate_linear_regressor_output_shapes,
)
from onnxmltools.convert.lightgbm.operator_converters.LightGbm import (
    convert_lightgbm  # noqa
)
import onnxmltools
from onnxmltools.convert.lightgbm._parse import WrappedBooster  # noqa
from skl2onnx import to_onnx
from skl2onnx._parse import (
    _parse_sklearn_classifier, _parse_sklearn_simple_model)


def calculate_lightgbm_output_shapes(operator):
    op = operator.raw_operator
    if hasattr(op, "_model_dict"):
        objective = op._model_dict['objective']
    elif hasattr(op, 'objective_'):
        objective = op.objective_
    else:
        raise RuntimeError(  # pragma: no cover
            "Unable to find attributes '_model_dict' or 'objective_' in "
            "instance of type %r (list of attributes=%r)." % (
                type(op), dir(op)))
    if objective.startswith('binary') or objective.startswith('multiclass'):
        return calculate_linear_classifier_output_shapes(operator)
    if objective.startswith('regression'):  # pragma: no cover
        return calculate_linear_regressor_output_shapes(operator)
    raise NotImplementedError(  # pragma: no cover
        "Objective '{}' is not implemented yet.".format(objective))
    
    
def lightgbm_parser(scope, model, inputs, custom_parsers=None):
    if hasattr(model, "fit"):
        raise TypeError(  # pragma: no cover
            "This converter does not apply on type '{}'."
            "".format(type(model)))

    if len(inputs) == 1:
        wrapped = WrappedBooster(model)
        objective = wrapped.get_objective()
        if objective.startswith('binary'):
            wrapped = WrappedLightGbmBoosterClassifier(wrapped)
            return _parse_sklearn_classifier(
                scope, wrapped, inputs, custom_parsers=custom_parsers)
        if objective.startswith('multiclass'):
            wrapped = WrappedLightGbmBoosterClassifier(wrapped)
            return _parse_sklearn_classifier(
                scope, wrapped, inputs, custom_parsers=custom_parsers)
        if objective.startswith('regression'):  # pragma: no cover
            return _parse_sklearn_simple_model(
                scope, wrapped, inputs, custom_parsers=custom_parsers)
        raise NotImplementedError(  # pragma: no cover
            "Objective '{}' is not implemented yet.".format(objective))

    # Multiple columns
    this_operator = scope.declare_local_operator('LightGBMConcat')
    this_operator.raw_operator = model
    this_operator.inputs = inputs
    var = scope.declare_local_variable(
        'Xlgbm', inputs[0].type.__class__([None, None]))
    this_operator.outputs.append(var)
    return lightgbm_parser(scope, model, this_operator.outputs,
                           custom_parsers=custom_parsers)
        
        
class WrappedLightGbmBoosterClassifier(ClassifierMixin):
    """
    Trick to wrap a LGBMClassifier into a class.
    """

    def __init__(self, wrapped):  # pylint: disable=W0231
        for k in {'boosting_type', '_model_dict', '_model_dict_info',
                  'operator_name', 'classes_', 'booster_', 'n_features_',
                  'objective_', 'boosting_type', 'n_features_'}:
            if hasattr(wrapped, k):
                setattr(self, k, getattr(wrapped, k))


update_registered_converter(
            WrappedLightGbmBoosterClassifier,
            'WrappedLightGbmBoosterClassifier',
            calculate_lightgbm_output_shapes,
            convert_lightgbm, parser=lightgbm_parser,
            options={'zipmap': [False, True], 'nocl': [False, True]})
update_registered_converter(
            WrappedBooster, 'WrappedBooster',
            calculate_lightgbm_output_shapes,
            convert_lightgbm, parser=lightgbm_parser,
            options={'zipmap': [False, True], 'nocl': [False, True]})
update_registered_converter(
            Booster, 'LightGbmBooster', calculate_lightgbm_output_shapes,
            convert_lightgbm, parser=lightgbm_parser)
        



X = [[0, 1], [1, 1], [2, 0], [1, 2], [-1, 2], [1, -2]]
X = numpy.array(X, dtype=numpy.float32)
y = [0, 1, 0, 1, 2, 2]
data = Dataset(X, label=y)
model = train(
    {'boosting_type': 'gbdt', 'objective': 'multiclass',
     'n_estimators': 3, 'min_child_samples': 1, 'num_class': 3},
    data)


onnx_model = to_onnx(
    model,
    initial_types=[('feature_input', FloatTensorType([None, 2021]))],
    options={WrappedLightGbmBoosterClassifier: {'zipmap': False}},
             target_opset={'': 15, 'ai.onnx.ml': 2})

xadupre avatar Oct 13 '22 09:10 xadupre

This issue comes from a feature your tree is using and which is not supported by the converter. Given the error you mention, it might be caused by an unexpected rule in a node (usually only < is used).

Thank you for the reply. I have split the dataset into parts and tried with models generated using split datasets, and yes, the conversions worked. And in this case 1943 out of 2021 features were used by lightGBM.

But when I generate model out of whole data, I am unable to convert that model. In this case 1975 out of 2021 features were used.

Is there a way to identify which feature is causing this issue? Can you throw some light on how to handle these kind of issues in conversion?

Bhuvanamitra avatar Oct 13 '22 09:10 Bhuvanamitra

It is difficult without knowing the tree. I don't have a way to replicate. However, if you are willing to write some code, you probably modify onnxmltools in the lightgbm just before it calls container.add_node, search for the first value in attrs['nodes_values'] which is not a float and then looks into the corresponding value in attrs['nodes_modes'].

xadupre avatar Oct 13 '22 18:10 xadupre

lightGBM_model.txt This is the lightGBM model file.

lightgbm_conf.txt This is the configuration used in building the lightGBM model.

I will try to debug from my end using your suggestion, I request you to go through these and with your expertise, please do the necessary help in handling this.

Bhuvanamitra avatar Oct 14 '22 05:10 Bhuvanamitra

Thanks for sharing your model. Lightgbm create list with mixed integers and floats for nodes values and ONNX checks there is only one type. I created a PR to fix it: https://github.com/onnx/onnxmltools/pull/591. It works for me on your tree.

xadupre avatar Oct 16 '22 10:10 xadupre

Thanks a lot. The conversion worked. When can we expect this to be pushed into a stable release?

Bhuvanamitra avatar Oct 17 '22 06:10 Bhuvanamitra