scikit-tree icon indicating copy to clipboard operation
scikit-tree copied to clipboard

Adding streaming trees for all oblique split classifiers

Open adam2392 opened this issue 2 years ago • 4 comments

https://github.com/neurodata/scikit-learn/pull/54 will introduce the partial_fit api for classification trees

I just reviewed the code, and I think we need to update Cython code based on the _set_split_node API being overridden already within _oblique_tree: https://github.com/neurodata/scikit-tree/blob/b44c9518057f44060cfc8cc0d08148de87685ce3/sktree/tree/_oblique_tree.pyx#L232-L239.

Rn _set_split_node assumes the node_id is self.node_count. Perhaps we should refactor sklearn _set_split_node to get passed in the node_id and make relevant changes here in _oblique_tree? That way https://github.com/neurodata/scikit-learn/pull/54 should work out of the box for oblique tree classifiers.

cc: @PSSF23 do you want to handle this refactoring inside your current partial_fit PR?

adam2392 avatar Aug 11 '23 19:08 adam2392

The commit here should enable the partial_fit without really too much effort: https://github.com/neurodata/scikit-learn/pull/54/commits/368df7a381b14952f9cee477a7e81b3768dcd2d0

adam2392 avatar Aug 14 '23 17:08 adam2392

Think this is now fully enabled by #114.

@PSSF23 lmk how this works. Will be exciting to see this in action.

adam2392 avatar Aug 22 '23 20:08 adam2392

I found some segmentation fault bug with cython on tree level partial_fit, trying to resolve it.

PSSF23 avatar Aug 23 '23 19:08 PSSF23

As of now this is mostly addressed w/ the PRs in our upstream scikit-learn fork.

However, as discussed offline, there are some issues with regards to seg faults that occur in the check_estimator unit-tests with certain Forests when doing partial_fit. I'll leave this issue open until #118 is fixed.

adam2392 avatar Sep 12 '23 20:09 adam2392