hummingbird xgboost tweedie loss predictions do not match

xgboost tweedie loss predictions do not match

Open Maggie1216 opened this issue 1 year ago • 2 comments

Hi team, I'm using hummingbird_ml==0.4.3 and xgboost==1.5.2, and testing on XGBRegressor with objective reg:tweedie predictions.

import xgboost as xgb
import pandas as pd
import hummingbird
from hummingbird.ml import convert
from sklearn.datasets import *

train_x, train_y = load_diabetes(return_X_y=True)
xgb_tweedie = xgb.XGBRegressor(objective='reg:tweedie', n_estimators = 50, tweedie_variance_power = 1.8)
xgb_tweedie.fit(train_x, train_y)
print(xgb_tweedie.predict(train_x[:10]))
xgb_tweedie_torch = convert(xgb_tweedie, 'pytorch', extra_config = {'post_transform': 'TWEEDIE'})
print(xgb_tweedie_torch.predict(train_x[:10]))

It prints: [160.32375 73.65087 140.53572 208.20435 115.15947 99.853676 125.59772 64.26746 110.12681 298.41394 ] [528.6581 242.85928 463.40848 686.5432 379.7321 329.2624 414.15146 211.91847 363.13666 984.0033 ]

After some analysis (I generated 1000 different regression datasets, also tried different tweedie_variance_power, etc.), I found that the xgb_tweedie_torch (after conversion) predictions are always 3.2974 * xgb_tweedie (before conversion). For example, 160.32375 * 3.2974 = 528.6581. I wonder why this is the case?

Mar 14 '23 18:03 Maggie1216

Hi @Maggie1216 thank you for the detailed example! It's possible that our implementation of tweedie does not cover some case. We'll add it to the backlog!

Mar 15 '23 16:03 ksaur

The constant 3.2974 happens to be 2 * exp(0.5), and 0.5 is the default base_score in XGBoost models. I suspect this discrepancy is related to how the base_score is handled in transforms.

Jul 12 '23 20:07 gorkemozkaya

hummingbird hummingbird copied to clipboard

xgboost tweedie loss predictions do not match

hummingbird
hummingbird copied to clipboard