Converted RandomForestClassifier has wrong prob when having multiple outputs

Open icyblade opened this issue 2 years ago • 0 comments

Description

When RandomForestClassifier has multiple outputs, the output probabilities from converted ONNX model are not correct (even not sum to 100%).

Repro

Code

import numpy as np
import sklearn
import skl2onnx
import onnxruntime

print(np.__version__)
print(sklearn.__version__)
print(skl2onnx.__version__)
print(onnxruntime.__version__)

np.random.seed(0)
model = sklearn.ensemble.RandomForestClassifier().fit(
    X=np.random.randint(0, 3, size=(64, 2)),  
    y=np.random.randint(0, 3, size=(64, 2)),  # (64, 1) is fine
)
print(model.predict_proba([[1, 1]]))

onnx = skl2onnx.convert_sklearn(
    model=model, 
    initial_types=[('X', skl2onnx.common.data_types.Int64TensorType(shape=[None, 2]))],
    options={'zipmap': False},
)
sess = onnxruntime.InferenceSession(onnx.SerializeToString())
print(sess.run(None, {'X': [[1, 1]]}))

Output

1.21.6
1.0.2
1.11.1
1.9.0
[array([[0.44302889, 0.4279672 , 0.12900391]]), array([[0.44302889, 0.42658636, 0.13038475]])]
[array([[0, 0]], dtype=int64), array([[[0.97, 0.94, 0.59]], [[0.97, 0.97, 0.62]]], dtype=float32)]

As you can see, the 1st output (from sklearn model) has the probabilities correctly sum to 100%, but the 2nd one (from ONNX model) is not.

Jul 21 '22 07:07 icyblade

sklearn-onnx sklearn-onnx copied to clipboard

Converted RandomForestClassifier has wrong prob when having multiple outputs

Description

Repro

Code

Output

sklearn-onnx
sklearn-onnx copied to clipboard