sklearn-onnx
sklearn-onnx copied to clipboard
Duplicated coefficients in LinearClassifier
I recently inherited a legacy project that contains a Logistic Regression for binary classification, stored in ONNX format, and converted with skl2onnx. When trying to do a bit of model introspection, I was surprised to find that both the model coefficients and its intercept are stored twice, where the second copy seems to be the negative of the first. Looking into the conversion code I found the following line in skl2onnx.operator_converters.linear_classifier:
op = operator.raw_operator
coefficients = op.coef_.flatten().astype(float).tolist()
classes = get_label_classes(scope, op)
number_of_classes = len(classes)
# some more code
if number_of_classes == 2:
coefficients = list(map(lambda x: -1 * x, coefficients)) + coefficients
intercepts = list(map(lambda x: -1 * x, intercepts)) + intercepts
This certainly explains the duplication, but I was wondering about the reason for storing values in this way. Thanks!
The first pass of converters was to go as fast as possible to cover as many models as possible. It could certainly be improved. Are you willing to contribute?
@sdpython Absolutely! However, I'd like to understand the reason behind doing it this way, mostly because I'm not sure how the ONNX runtime deals with the duplicated coefficients.
This was one mayn years. I don't remember the reasons with everything we did. This PR https://github.com/onnx/onnx/pull/5874 changes the way TreeEnsembleRegressor and TreeEnsembleClassifier are defined. It is now one single operator. We could do the same with LinearClassifier and LinearRegressor or just deprecated them as they can be expressed with regular operators.
A change involves three steps:
- changes onnx standard if needed
- implement a kernel in onnxruntime which runs the modified or added operator in onnx
- update the converting library to use the new operator or the new way to convert regression or classification.
How would you like to contribute?