dtreeviz
dtreeviz copied to clipboard
regression value in dtreeviz doesn't equal to the leaf weight of xgboost dumped model
import xgboost as xgb
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
diabetes = load_diabetes()
feature_names = diabetes.feature_names
X = diabetes.data
Y = diabetes.target
test_ratio = 0.2
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=test_ratio, random_state=seed)
dtrain = xgb.DMatrix(x_train, y_train, feature_names=feature_names)
dtest = xgb.DMatrix(x_test, y_test, feature_names=feature_names)
params = {
"objective": "reg:squarederror",
"booster": "gbtree",
"max_depth": 3,
}
num_estimators = 2
watch_list = [(dtrain, "train"), (dtest, "eval")]
model = xgb.train(params=params, dtrain=dtrain, num_boost_round=num_estimators, evals=watch_list)
model.dump_model("diabetes_reg_squarederror.txt")
The content of diabetes_reg_squarederror.txt
is:
Prediction of model:
pred = model.predict(dtest.slice(range(2)))
# Output is: array([106.27969 , 124.411964], dtype=float32)
pred_leaf = model.predict(dtest.slice(range(2)), iteration_range=(0,2), pred_leaf=1)
# Output is:
#array([[14., 12.],
# [14., 13.]], dtype=float32)
For the sampe x_test[0]
, the leaf index of this sample is [14, 12]
,the sum of leaf score is 67.5542221 + 38.2254753 = 105.7796974
, this value is nearly equal to the prediction. But in the tree visualized:
viz = dtreeviz(model,
tree_index=0,
x_data=x_train,
y_data=y_train,
X=x_test[0],
fancy=1,
target_name='target',
feature_names=feature_names,
title=f"{params['objective']} - Diabetes data set",
scale=1.5)
As we can see, the target value of sample x_test[0]
in booster[0]
is 228.43
, but the leaf score is 67.5542221
according to the dumped model.
I'm confused about this problem, please help me, thanks.
@tlapusan could this be related to pruning again somehow? in other words, we visualize it correctly but we get the wrong prediction somehow?
@tlapusan could this be related to pruning again somehow? in other words, we visualize it correctly but we get the wrong prediction somehow?
I dont't know how dtreeviz get the leaf score of xgboost booster, could it re related the mechanism of how dtreeviz parse the model?
hi. yeah, no doubt as they started pruning trees, we might have to look at our shadow model.
Hi @GZYZG
@parrt I have to check this, but I do remember that we dont have implemented the weighted tree version for xgboost.
Tudor