scikit-tree
scikit-tree copied to clipboard
Simplifying projection matrix rows
Is your feature request related to a problem? Please describe.
While developing a PMML converter for oblique trees (see #255), I noticed that the projection matrix (as retrievable via the ObliqueTree.proj_vecs
attribute) contains two types of "axis aligned split" definitions (ie. projection matrix rows where only a single row element is set to a non-zero value).
These two types are:
- Positive/default axis aligned split. The only non-zero row element is
1.0
. For example,[0, 0, 1, 0]
. - Negative axis aligned split. The only non-zero row element is
-1.0
. For example,[0, -1, 0, 0]
.
Describe the solution you'd like
I would propose that all axis aligned splits should be standardized to the positive/default axis aligned split representation.
Negating a split condition does not add any information to it. But it makes interpreting the resulting oblique tree more complicated, because the associated split threshold value also appears negated.
- Positive/default split:
feature <= threshold
- Negative split:
-1 * feature <= -1 * threshold
In other words, the algorithm should not multiply standalone feature values with -1
during training. It should keep them as-is.
Describe alternatives you've considered
The current behaviour (SkTree 0.7.2) is okay, but the resulting oblique trees are unnecessarily complicated.