scikit-tree icon indicating copy to clipboard operation
scikit-tree copied to clipboard

Simplifying projection matrix rows

Open vruusmann opened this issue 10 months ago • 4 comments

Is your feature request related to a problem? Please describe.

While developing a PMML converter for oblique trees (see #255), I noticed that the projection matrix (as retrievable via the ObliqueTree.proj_vecs attribute) contains two types of "axis aligned split" definitions (ie. projection matrix rows where only a single row element is set to a non-zero value).

These two types are:

  • Positive/default axis aligned split. The only non-zero row element is 1.0. For example, [0, 0, 1, 0].
  • Negative axis aligned split. The only non-zero row element is -1.0. For example, [0, -1, 0, 0].

Describe the solution you'd like

I would propose that all axis aligned splits should be standardized to the positive/default axis aligned split representation.

Negating a split condition does not add any information to it. But it makes interpreting the resulting oblique tree more complicated, because the associated split threshold value also appears negated.

  • Positive/default split: feature <= threshold
  • Negative split: -1 * feature <= -1 * threshold

In other words, the algorithm should not multiply standalone feature values with -1 during training. It should keep them as-is.

Describe alternatives you've considered

The current behaviour (SkTree 0.7.2) is okay, but the resulting oblique trees are unnecessarily complicated.

vruusmann avatar Apr 25 '24 07:04 vruusmann