hummingbird
hummingbird copied to clipboard
xgboost tweedie loss predictions do not match
Hi team, I'm using hummingbird_ml==0.4.3 and xgboost==1.5.2, and testing on XGBRegressor with objective reg:tweedie predictions.
import xgboost as xgb
import pandas as pd
import hummingbird
from hummingbird.ml import convert
from sklearn.datasets import *
train_x, train_y = load_diabetes(return_X_y=True)
xgb_tweedie = xgb.XGBRegressor(objective='reg:tweedie', n_estimators = 50, tweedie_variance_power = 1.8)
xgb_tweedie.fit(train_x, train_y)
print(xgb_tweedie.predict(train_x[:10]))
xgb_tweedie_torch = convert(xgb_tweedie, 'pytorch', extra_config = {'post_transform': 'TWEEDIE'})
print(xgb_tweedie_torch.predict(train_x[:10]))
It prints: [160.32375 73.65087 140.53572 208.20435 115.15947 99.853676 125.59772 64.26746 110.12681 298.41394 ] [528.6581 242.85928 463.40848 686.5432 379.7321 329.2624 414.15146 211.91847 363.13666 984.0033 ]
After some analysis (I generated 1000 different regression datasets, also tried different tweedie_variance_power, etc.), I found that the xgb_tweedie_torch (after conversion) predictions are always 3.2974 * xgb_tweedie (before conversion). For example, 160.32375 * 3.2974 = 528.6581. I wonder why this is the case?
Hi @Maggie1216 thank you for the detailed example! It's possible that our implementation of tweedie does not cover some case. We'll add it to the backlog!
The constant 3.2974
happens to be 2 * exp(0.5)
, and 0.5 is the default base_score in XGBoost models. I suspect this discrepancy is related to how the base_score is handled in transforms.