xgbfir
xgbfir copied to clipboard
Where can I find how interaction gains are calculated?
Where can I find how interaction gains are calculated?
Thanks,
Found this except useful from the XGBoost docs:
To build a tree, the dataset is divided recursively several times. At the end of the process, you get groups of observations. Each division operation is called a split.
Not all splits are equally important. Basically the first split of a tree will have more impact on the purity that, for instance, the deepest split. Intuitively, we understand that the first split makes most of the work, and the following splits focus on smaller parts of the dataset which have been misclassified by the first tree.
The improvement brought by each split can be measured, it is the gain.
XGBoost offers a better representation: feature importance. Feature importance is about averaging the gain of each feature for all splits and all trees.
Found this except useful from the XGBoost docs:
To build a tree, the dataset is divided recursively several times. At the end of the process, you get groups of observations. Each division operation is called a split. Not all splits are equally important. Basically the first split of a tree will have more impact on the purity that, for instance, the deepest split. Intuitively, we understand that the first split makes most of the work, and the following splits focus on smaller parts of the dataset which have been misclassified by the first tree. The improvement brought by each split can be measured, it is the gain. XGBoost offers a better representation: feature importance. Feature importance is about averaging the gain of each feature for all splits and all trees.
this is talking about how to select important features, but not the important feature interactions. I am curious of the question, too. Hope someone can give an answer.