treeinterpreter Gradient boosting implementation

This pull request includes implementation of treeinterpretor for scikit's gradient boosting. Note about the GBT Classifier: a function is applied to convert scores to probas (usually a logistic function), so for the GBT classifier: probabilities = score_to_proba( bias + sum(contributions) ).

It also fixes a problem of speed encountered with large trees: previously the _predict_tree function was going through all the nodes in the tree to predict a line. Now it only goes from the root to the leaf where the row lies. This implementation really decreases the running time for large trees as complexity is now O(log(n)) instead of O(n) - n being the number of nodes.

What do you think about this code ? Please don't hesitate to send feedback about it, or ask me if anything remains unclear.

Aug 22 '16 13:08 marcbllv

Thanks for the commit! Will review and merge

Aug 22 '16 16:08 andosa

Hi @marcbllv,

I have viewed your codes. So great to also have the interpreter for Gradient Boost Trees! However, I found some potential issues.
According to sklearn's user guide here http://scikit-learn.org/stable/modules/ensemble.html#gradient-boosting, GBRT builds the additive model in a forward stagewise fashion:

Where the step length is chosen using line search:

Thus, to interpret GBM, the step value could't be ignored. I have also checked the source codes: In the _fit_stage function, codes followed update value of each tree leaf.

loss.update_terminal_regions(tree.tree_, X, y, residual, y_pred,
                                         sample_weight, sample_mask,
                                         self.learning_rate, k=k)

Details about the implementation of function update_terminal_regions depends on which kind of loss objective is used. I have deep dive into LeastSquaresError(for Gradient Boosting Regressor) and BinomialDeviance(for Gradient Boosting Binary Classifier).

For LeastSquaresError, which is always used for regression problem, things go quite simple. As mentioned as followed, terminal regions need not to be updated for least squares. For GradientBoostingRegressor, feature contributions for each tree branch can be calculated the same way as RandomForest, because values of tree nodes from the same branch are consistent.

class LeastSquaresError(RegressionLossFunction):
    """Loss function for least squares (LS) estimation.
    Terminal regions need not to be updated for least squares. """
    def init_estimator(self):
        return MeanEstimator()

    def __call__(self, y, pred, sample_weight=None):
        if sample_weight is None:
            return np.mean((y - pred.ravel()) ** 2.0)
        else:
            return (1.0 / sample_weight.sum() *
                    np.sum(sample_weight * ((y - pred.ravel()) ** 2.0)))

    def negative_gradient(self, y, pred, **kargs):
        return y - pred.ravel()

    def update_terminal_regions(self, tree, X, y, residual, y_pred,
                                sample_weight, sample_mask,
                                learning_rate=1.0, k=0):
        """Least squares does not need to update terminal regions.
        But it has to update the predictions.
        """
        # update predictions
        y_pred[:, k] += learning_rate * tree.predict(X).ravel()

    def _update_terminal_region(self, tree, terminal_regions, leaf, X, y,
                                residual, pred, sample_weight):
        pass

However, for BinomialDeviance, which is always used for GradientBoostingClassifier, things are much more complicated. As showed in the codes followed, value of each terminal tree node (leaf) is updated. Thus, values of tree nodes from the same tree branch are not consistent. When calculating the feature contributions for each tree, the step value couldn't be ignored. Values of parent nodes should be modified in the same way as leaf node to make sure that all node values are consistent. Then the same logic can be applied on the modified trees.

def _update_terminal_region(self, tree, terminal_regions, leaf, X, y, residual, pred, sample_weight):
    """Make a single Newton-Raphson step.
    our node estimate is given by:
        sum(w * (y - prob)) / sum(w * prob * (1 - prob))
    we take advantage that: y - prob = residual
    """
    terminal_region = np.where(terminal_regions == leaf)[0]
    residual = residual.take(terminal_region, axis=0)
    y = y.take(terminal_region, axis=0)
    sample_weight = sample_weight.take(terminal_region, axis=0)

    numerator = np.sum(sample_weight * residual)
    denominator = np.sum(sample_weight * (y - residual) * (1 - y + residual))

    if denominator == 0.0:
        tree.value[leaf, 0, 0] = 0.0
    else:
        tree.value[leaf, 0, 0] = numerator / denominator

To prove this, I modified the codes of the GradientBoosting module to store the original trees (whose tree leaves are not updated) and compared them with the modified trees extracted from bst.estomators_. It turned out that only the leaf values of the original trees were updated. Values of parent nodes are all the same.

Jan 24 '17 07:01 sosuneko

Hello, I try to use your implementation of the package but it seems too old after the new updates. Could you help me to use it in my current project?

Jun 11 '21 19:06 Rankof7