hummingbird
hummingbird copied to clipboard
Multiclass rouding errors
With mulitclass datasets (such as covtype or iris), sometimes we get errors on rounding:
import numpy as np
import torch
from hummingbird import convert_sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import fetch_covtype
X, y = fetch_covtype(return_X_y=True)
nrows=15000
X = X[0:nrows]
y = y[0:nrows]
X_torch = torch.from_numpy(X).float()
model = RandomForestClassifier(n_estimators=10, max_depth=10)
model.fit(X, y)
pytorch_model = convert_sklearn(
model,
extra_config = {"tree_implementation": "gemm"})
skl = model.predict_proba(X)
pytorch_model.to('cuda')
hum_gpu = pytorch_model(X_torch.to('cuda'))
np.testing.assert_allclose(skl, hum_gpu[1].data.to('cpu').numpy(), rtol=1e-6, atol=1e-6)
gives error:
AssertionError:
Not equal to tolerance rtol=1e-06, atol=1e-06
Mismatched elements: 332 / 105000 (0.316%)
Max absolute difference: 0.11943346
Max relative difference: 5.82971106
x: array([[0.121156, 0.200913, 0.008188, ..., 0.643138, 0.00637 , 0.020236],
[0.110779, 0.207474, 0.008188, ..., 0.646954, 0.00637 , 0.020236],
[0.1959 , 0.707151, 0.008188, ..., 0.050266, 0.00637 , 0.032125],...
y: array([[0.121156, 0.200913, 0.008188, ..., 0.643138, 0.00637 , 0.020236],
[0.110779, 0.207474, 0.008188, ..., 0.646954, 0.00637 , 0.020236],
[0.1959 , 0.707152, 0.008188, ..., 0.050266, 0.00637 , 0.032125],...
I'm interested in solving this issue, can you assign this to me? @interesaaat