dtreeviz icon indicating copy to clipboard operation
dtreeviz copied to clipboard

regression value in dtreeviz doesn't equal to the leaf weight of xgboost dumped model

Open GZYZG opened this issue 3 years ago • 4 comments

import xgboost as xgb
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

diabetes = load_diabetes()
feature_names = diabetes.feature_names
X = diabetes.data
Y = diabetes.target
test_ratio = 0.2

x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=test_ratio, random_state=seed)
dtrain = xgb.DMatrix(x_train, y_train, feature_names=feature_names)
dtest = xgb.DMatrix(x_test, y_test, feature_names=feature_names)

params = {
    "objective": "reg:squarederror",
    "booster": "gbtree", 
    "max_depth": 3, 
}
num_estimators = 2
watch_list = [(dtrain, "train"), (dtest, "eval")]

model = xgb.train(params=params, dtrain=dtrain, num_boost_round=num_estimators, evals=watch_list)

model.dump_model("diabetes_reg_squarederror.txt")

The content of diabetes_reg_squarederror.txt is:

image

Prediction of model:

pred = model.predict(dtest.slice(range(2)))
# Output is: array([106.27969 , 124.411964], dtype=float32)

pred_leaf = model.predict(dtest.slice(range(2)), iteration_range=(0,2), pred_leaf=1)
# Output is: 
#array([[14., 12.],
#       [14., 13.]], dtype=float32)

For the sampe x_test[0], the leaf index of this sample is [14, 12],the sum of leaf score is 67.5542221 + 38.2254753 = 105.7796974, this value is nearly equal to the prediction. But in the tree visualized:

viz = dtreeviz(model, 
               tree_index=0,
               x_data=x_train,
               y_data=y_train,
               X=x_test[0],  
               fancy=1,
               target_name='target',
               feature_names=feature_names, 
               title=f"{params['objective']} - Diabetes data set",
               scale=1.5)

image

As we can see, the target value of sample x_test[0] in booster[0] is 228.43, but the leaf score is 67.5542221 according to the dumped model.

I'm confused about this problem, please help me, thanks.

GZYZG avatar Feb 26 '22 14:02 GZYZG

@tlapusan could this be related to pruning again somehow? in other words, we visualize it correctly but we get the wrong prediction somehow?

parrt avatar Feb 26 '22 18:02 parrt

@tlapusan could this be related to pruning again somehow? in other words, we visualize it correctly but we get the wrong prediction somehow?

I dont't know how dtreeviz get the leaf score of xgboost booster, could it re related the mechanism of how dtreeviz parse the model?

GZYZG avatar Feb 27 '22 07:02 GZYZG

hi. yeah, no doubt as they started pruning trees, we might have to look at our shadow model.

parrt avatar Feb 27 '22 17:02 parrt

Hi @GZYZG

@parrt I have to check this, but I do remember that we dont have implemented the weighted tree version for xgboost.

Tudor

tlapusan avatar Feb 27 '22 17:02 tlapusan