sklearn-onnx icon indicating copy to clipboard operation
sklearn-onnx copied to clipboard

Duplicated coefficients in LinearClassifier

Open wsperat opened this issue 1 year ago • 3 comments

I recently inherited a legacy project that contains a Logistic Regression for binary classification, stored in ONNX format, and converted with skl2onnx. When trying to do a bit of model introspection, I was surprised to find that both the model coefficients and its intercept are stored twice, where the second copy seems to be the negative of the first. Looking into the conversion code I found the following line in skl2onnx.operator_converters.linear_classifier:

op = operator.raw_operator
coefficients = op.coef_.flatten().astype(float).tolist()
classes = get_label_classes(scope, op)
number_of_classes = len(classes)
# some more code
if number_of_classes == 2:
    coefficients = list(map(lambda x: -1 * x, coefficients)) + coefficients
    intercepts = list(map(lambda x: -1 * x, intercepts)) + intercepts

This certainly explains the duplication, but I was wondering about the reason for storing values in this way. Thanks!

wsperat avatar Jan 31 '24 15:01 wsperat

The first pass of converters was to go as fast as possible to cover as many models as possible. It could certainly be improved. Are you willing to contribute?

sdpython avatar Feb 01 '24 13:02 sdpython

@sdpython Absolutely! However, I'd like to understand the reason behind doing it this way, mostly because I'm not sure how the ONNX runtime deals with the duplicated coefficients.

wsperat avatar Feb 02 '24 13:02 wsperat

This was one mayn years. I don't remember the reasons with everything we did. This PR https://github.com/onnx/onnx/pull/5874 changes the way TreeEnsembleRegressor and TreeEnsembleClassifier are defined. It is now one single operator. We could do the same with LinearClassifier and LinearRegressor or just deprecated them as they can be expressed with regular operators.

A change involves three steps:

  • changes onnx standard if needed
  • implement a kernel in onnxruntime which runs the modified or added operator in onnx
  • update the converting library to use the new operator or the new way to convert regression or classification.

How would you like to contribute?

xadupre avatar Feb 08 '24 13:02 xadupre