scikit-tree
scikit-tree copied to clipboard
ENH enable multiview-oblique splitting & unlock multiclass restrictions
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Any other comments?
I think this looks correct, but I have not validated it, and also it's usually a bit hard to validate Cython code.
I would recommend writing a small test Cython test class that you can invoke in Python, which then calls the
sample_proj_matfor an input array. You can then visualize and verify that it will do what you want it to do for a set of features. You can then manually show that the MultiViewObliqueSplitter samples what you would expect to be the right projection matrix for multiple sets of features.For example, see the
MultiViewSplitterTesterI wrote for test the MultiViewSplitter, andplot_multiview_axis_aligned_splitter.py.
Will do!
@adam2392 Can you help me recheck with Multiview Oblique splitter when uniform sampling condition is true. I modified it to be more reasonable to achieve uniform sampling.
@adam2392 Can you help me recheck with Multiview Oblique splitter when uniform sampling condition is true. I modified it to be more reasonable to achieve uniform sampling.
Did a loose check, and it looks in the right direction. Will do an in-depth review after we can visually verify the projection matrix makes sense.
I'm unsure why the docs are broken, but it would be nice to be able to check the output visually here: https://output.circle-artifacts.com/output/job/96dcd4fa-4fbc-4b29-b358-314687b9af0b/artifacts/0/dev/use.html via the circleCI job: https://circleci.com/gh/neurodata/treeple/561. It makes reviewing easier/trivial even.
This is what I got though when I ran plot_multiview_oblique.py locally. To me this seems weird because your feature_combinations is 2, but on average, you're definitely sampling more than 2 feature indices per projection vector. Also, if this is an oblique combination, I'm unsure why the projection weights are only of value +1? For axis-aligned, we make them +1, but for oblique, we made them +/- 1, or 0. Without giving the code a deeper look, I conclude there's something not working. Open to being convinced tho.
When I directly output the projection matrix from plot_multiview_oblique.py, I got matrix:
The feature_combination is around 2 on average. And there are -1s, 0s and 1s.
Done with with visualization bug: