scikit-tree
scikit-tree copied to clipboard
Decision Function on each node
Hello, I'm training an Oblique Decision Tree with 5 features as input, and max_depth of 4. When I plot the model.tree_.threshold, I get something like this:
array([-0.14181135, 0.09716574, 0.12667369, -0.96893096, -2. ,
-2. , 0.78663591, -2. , -2. , -1.14851594,
-2. , 0.32079886, -2. , -2. , -0.69028786,
-0.52701822, 0.01895713, -2. , -2. , 0.02781246,
-2. , -2. , 0.35172133, -0.07838123, -2. ,
-2. , 0.59768865, -2. , -2. ])
I would like to know what are the linear combinations of features that are taking place in each node, and what weight is being assigned to each feature. Is there I way I can know this?
Thank you in advance
I think you are looking for the projection matrix and want to expose a Python API to access it?
There is already tree.tree_.get_projection_matrix(). However, there is no Python API. If you are interested in contributing a PR and a relevant unit-test, you can add a class method to the ObliqueDecisionTreeClassifier/Regressor, PatchObliqueDecisionTreeClassifier/Regressor, ExtraObliqueDecisionTreeClassifier/Regressor, then I can help review your PR.
possibly something like
def projection_matrix_(self):
# should only work if is fitted, otw error out
# return array
According to the mathematical formulation of oblique trees found on the project website, "the data at node m be represented by Qm with nm samples. For each candidate split (ai, tm) consisting of a (possibly sparse) vector ai and threshold m, partition the data into Qm,left and Qm, right subsets"
array([[ 0., -1., 1., 0., 0.],
[ 0., 0., 1., -1., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., -1., -1.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
...
[ 0., 0., 0., 0., 0.]])
I was expecting the vector ai to have real-valued entries, also different from {-1,1}, belonging to the weights assigned to each feature on each node.
So the way I am interpreting the matrix I now get is that, for instance, on the first node, feature_1 is being subtracted from feature_2 and if that value is larger than the threshold for that node, then the sample moves to the left child, and so on. In other words, the feature coefficients are always 1 or -1, and the features not used on each node have a coefficient of 0. Is this the proper way of interpreting such a matrix?
Yep seems right
Ok, that clears it up, thank you so much!