onnxmltools icon indicating copy to clipboard operation
onnxmltools copied to clipboard

[LGBMRanker] Bug in _parse.py and misleading documentation in convert.py

Open sheon-han-zocdoc opened this issue 4 years ago • 4 comments

Issue

In onnxmltools/convert/lightgbm/convert.py, inline comments indicate that only LGBMClassifiers, LGBMRegressor and Booster are supported:

This function produces an equivalent ONNX model of the given lightgbm model.
The supported lightgbm modules are listed below.
    
* `LGBMClassifiers <https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html>`_
* `LGBMRegressor <https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html>`_
* `Booster <https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html>`_

Steps

  1. We trained our model using the official microsoft/LightGBM, and inspecting its source code shows that it assigns the value "lambdarank" as the model's objective function:
if isinstance(self, LGBMRegressor):
    self._objective = "regression"
elif isinstance(self, LGBMClassifier):
    self._objective = "binary"
elif isinstance(self, LGBMRanker):
    self._objective = "lambdarank"	
  1. But when we tried to convert the model, we got "unsupported LightGbm objective: lambarank" error. We believe the issue is in convert/lightgbm/_parse.py where it checks for the string "binary" and "regression" but not "lambdarank":
if _model_dict['objective'].startswith('binary'):
    self.operator_name = 'LgbmClassifier'
elif _model_dict['objective'].startswith('regression'):
    self.operator_name = 'LgbmRegressor'
  1. To bypass this conditionals, we changed two occurrences of the string "lambdarank" from our model.txt. Then, we were able to convert the model to ONNX, and the converted model made correct predictions:
# our changes in model.txt
tree
version=v3
num_class=1
.
.
.
objective=lambdarank => objective=regression # change 1
.
.
.
parameters:
[boosting: gbdt]
[objective: lambdarank] => [objective: regression] # change 2

We suggest creating a fix in onnxmltools/convert/lightgbm/_parse.py to accept models with lambdarank as objective.

sheon-han-zocdoc avatar Sep 30 '19 18:09 sheon-han-zocdoc

@sheon-han-zocdoc Thanks for the report. The original LightGBM converter only support models obtained from scikit-learn interface, and only the regressor and classifier. By the time when I contribute the Booster converter, I reused the code for the regressor and classifier. Since many other objectives and features are not tested, I did not allow conversion of those models.

I believe conversion from the LightGBM booster model is the way to go, and I am working on that. Can you confirm that, in your case, simply bypassing the objective check does not yield different results? If so, I can add some minimal testcases and relax the restriction.

hongzmsft avatar Oct 17 '19 16:10 hongzmsft

A follow up on this issue, we are experiencing the similar problem where we want to convert LGBMRanker to ONNX. We basically applied exactly what @sheon-han-zocdoc suggested to work around this issue.

With regards to @hongzmsft 's comment on the results that is generated bypassing objective check, here is what we got

Code: print(benchmark) print(pred_onx.flatten()) np.allclose(pred_onx.flatten(), benchmark, atol=1e-6)

Result: [-1.65461112 -2.36691972 -1.14647361 ... -2.69576259 -2.66562529 -2.57748439] [-1.6546111 -2.3669205 -1.1464738 ... -2.6957629 -2.6656258 -2.577485 ] True

There different is accepted at tolerance level 1e-6 but not on 1e-7, this could purely due to floating point translation and calculation as result of the convert and 1e-6 is totally acceptable for us.

go2ready avatar Jun 25 '20 12:06 go2ready

Any chance for this easy fix to be merged? I'm also obtaining the equivalent results with LGBM and ONNX by altering the objective from lambdamark to regression in the model.

It would be great if users can avoid doing this "hack" by letting the converter handling rankers the same way as regressors.

karllessard avatar Apr 19 '22 13:04 karllessard

Hi all, saved model using this way works very well in Python. I also checked it returns exactly the same prediction.

However, the prediction (and the order of predictions) in C++ is different from the prediction in Python.

I'm using following code for loading the model and getting predictions in C++ and I think output_label_float[0] should be exactly the same values as the prediction in Python.

Do you have the same problem?

std::string model_file = "lgb.onnx";
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "example-model-explorer");
Ort::SessionOptions session_options;
Ort::Experimental::Session session = Ort::Experimental::Session(env, model_file, session_options); // access experimental components via the Experimental namespace
MatchingFunc_t matchONNX = [](const GlobalFwdTrack &mchTrack,
                              const TrackParCovFwd &mftTrack) -> double {
  std::vector<std::string> input_names;
  std::vector<std::vector<int64_t>> input_shapes;
  std::vector<std::string> output_names;
  std::vector<std::vector<int64_t>> output_shapes;
  input_names = session.GetInputNames();
  input_shapes = session.GetInputShapes();
  output_names = session.GetOutputNames();
  output_shapes = session.GetOutputShapes();

  auto input_shape = input_shapes[0];
  input_shape[0] = 1;

  std::vector<float> input_tensor_values;
  input_tensor_values = getVariables(mchTrack, mftTrack);

  std::vector<Ort::Value> input_tensors;
  input_tensors.push_back(Ort::Experimental::Value::CreateTensor<float>(input_tensor_values.data(), input_tensor_values.size(), input_shape));
  std::vector<Ort::Value> output_tensors =session.Run(input_names, input_tensors, output_names);
  const int *output_label = output_tensors[0].GetTensorData<int>();
  const float *output_label_float = output_tensors[0].GetTensorData<float>();
  auto score = 1 - output_label_float[0];

  return score;
};

RenEjima avatar Jul 12 '22 15:07 RenEjima