scikit-tree icon indicating copy to clipboard operation
scikit-tree copied to clipboard

ENH enable multiview-oblique splitting & unlock multiclass restrictions

Open YuxinB opened this issue 7 months ago • 6 comments
trafficstars

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

YuxinB avatar Mar 30 '25 23:03 YuxinB

I think this looks correct, but I have not validated it, and also it's usually a bit hard to validate Cython code.

I would recommend writing a small test Cython test class that you can invoke in Python, which then calls the sample_proj_mat for an input array. You can then visualize and verify that it will do what you want it to do for a set of features. You can then manually show that the MultiViewObliqueSplitter samples what you would expect to be the right projection matrix for multiple sets of features.

For example, see the MultiViewSplitterTester I wrote for test the MultiViewSplitter, and plot_multiview_axis_aligned_splitter.py.

Will do!

YuxinB avatar Mar 31 '25 17:03 YuxinB

@adam2392 Can you help me recheck with Multiview Oblique splitter when uniform sampling condition is true. I modified it to be more reasonable to achieve uniform sampling.

YuxinB avatar Apr 01 '25 16:04 YuxinB

@adam2392 Can you help me recheck with Multiview Oblique splitter when uniform sampling condition is true. I modified it to be more reasonable to achieve uniform sampling.

Did a loose check, and it looks in the right direction. Will do an in-depth review after we can visually verify the projection matrix makes sense.

adam2392 avatar Apr 02 '25 01:04 adam2392

I'm unsure why the docs are broken, but it would be nice to be able to check the output visually here: https://output.circle-artifacts.com/output/job/96dcd4fa-4fbc-4b29-b358-314687b9af0b/artifacts/0/dev/use.html via the circleCI job: https://circleci.com/gh/neurodata/treeple/561. It makes reviewing easier/trivial even.

This is what I got though when I ran plot_multiview_oblique.py locally. To me this seems weird because your feature_combinations is 2, but on average, you're definitely sampling more than 2 feature indices per projection vector. Also, if this is an oblique combination, I'm unsure why the projection weights are only of value +1? For axis-aligned, we make them +1, but for oblique, we made them +/- 1, or 0. Without giving the code a deeper look, I conclude there's something not working. Open to being convinced tho.

Figure_1

adam2392 avatar Apr 03 '25 01:04 adam2392

When I directly output the projection matrix from plot_multiview_oblique.py, I got matrix: image The feature_combination is around 2 on average. And there are -1s, 0s and 1s.

YuxinB avatar Apr 03 '25 01:04 YuxinB

Done with with visualization bug: image image

YuxinB avatar Apr 03 '25 03:04 YuxinB