hummingbird icon indicating copy to clipboard operation
hummingbird copied to clipboard

xgboost tweedie loss predictions do not match

Open Maggie1216 opened this issue 1 year ago • 2 comments

Hi team, I'm using hummingbird_ml==0.4.3 and xgboost==1.5.2, and testing on XGBRegressor with objective reg:tweedie predictions.

import xgboost as xgb
import pandas as pd
import hummingbird
from hummingbird.ml import convert
from sklearn.datasets import *

train_x, train_y = load_diabetes(return_X_y=True)
xgb_tweedie = xgb.XGBRegressor(objective='reg:tweedie', n_estimators = 50, tweedie_variance_power = 1.8)
xgb_tweedie.fit(train_x, train_y)
print(xgb_tweedie.predict(train_x[:10]))
xgb_tweedie_torch = convert(xgb_tweedie, 'pytorch', extra_config = {'post_transform': 'TWEEDIE'})
print(xgb_tweedie_torch.predict(train_x[:10]))

It prints: [160.32375 73.65087 140.53572 208.20435 115.15947 99.853676 125.59772 64.26746 110.12681 298.41394 ] [528.6581 242.85928 463.40848 686.5432 379.7321 329.2624 414.15146 211.91847 363.13666 984.0033 ]

After some analysis (I generated 1000 different regression datasets, also tried different tweedie_variance_power, etc.), I found that the xgb_tweedie_torch (after conversion) predictions are always 3.2974 * xgb_tweedie (before conversion). For example, 160.32375 * 3.2974 = 528.6581. I wonder why this is the case?

Maggie1216 avatar Mar 14 '23 18:03 Maggie1216

Hi @Maggie1216 thank you for the detailed example! It's possible that our implementation of tweedie does not cover some case. We'll add it to the backlog!

ksaur avatar Mar 15 '23 16:03 ksaur

The constant 3.2974 happens to be 2 * exp(0.5), and 0.5 is the default base_score in XGBoost models. I suspect this discrepancy is related to how the base_score is handled in transforms.

gorkemozkaya avatar Jul 12 '23 20:07 gorkemozkaya