hummingbird icon indicating copy to clipboard operation
hummingbird copied to clipboard

Multiclass rouding errors

Open ksaur opened this issue 4 years ago • 1 comments

With mulitclass datasets (such as covtype or iris), sometimes we get errors on rounding:

import numpy as np
import torch
from hummingbird import convert_sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import fetch_covtype

X, y = fetch_covtype(return_X_y=True)
nrows=15000
X = X[0:nrows]
y = y[0:nrows]
X_torch = torch.from_numpy(X).float()

model = RandomForestClassifier(n_estimators=10, max_depth=10)
model.fit(X, y)

pytorch_model = convert_sklearn(
    model, 
    extra_config = {"tree_implementation": "gemm"})


skl = model.predict_proba(X)
pytorch_model.to('cuda')
hum_gpu = pytorch_model(X_torch.to('cuda'))

np.testing.assert_allclose(skl, hum_gpu[1].data.to('cpu').numpy(), rtol=1e-6, atol=1e-6)

gives error:


AssertionError: 
Not equal to tolerance rtol=1e-06, atol=1e-06

Mismatched elements: 332 / 105000 (0.316%)
Max absolute difference: 0.11943346
Max relative difference: 5.82971106
 x: array([[0.121156, 0.200913, 0.008188, ..., 0.643138, 0.00637 , 0.020236],
       [0.110779, 0.207474, 0.008188, ..., 0.646954, 0.00637 , 0.020236],
       [0.1959  , 0.707151, 0.008188, ..., 0.050266, 0.00637 , 0.032125],...
 y: array([[0.121156, 0.200913, 0.008188, ..., 0.643138, 0.00637 , 0.020236],
       [0.110779, 0.207474, 0.008188, ..., 0.646954, 0.00637 , 0.020236],
       [0.1959  , 0.707152, 0.008188, ..., 0.050266, 0.00637 , 0.032125],...

ksaur avatar Apr 21 '20 00:04 ksaur

I'm interested in solving this issue, can you assign this to me? @interesaaat

gyeongin avatar Jun 22 '20 04:06 gyeongin