scikit-tree
scikit-tree copied to clipboard
Adding streaming trees for all oblique split classifiers
https://github.com/neurodata/scikit-learn/pull/54 will introduce the partial_fit api for classification trees
I just reviewed the code, and I think we need to update Cython code based on the _set_split_node API being overridden already within _oblique_tree: https://github.com/neurodata/scikit-tree/blob/b44c9518057f44060cfc8cc0d08148de87685ce3/sktree/tree/_oblique_tree.pyx#L232-L239.
Rn _set_split_node assumes the node_id is self.node_count. Perhaps we should refactor sklearn _set_split_node to get passed in the node_id and make relevant changes here in _oblique_tree? That way https://github.com/neurodata/scikit-learn/pull/54 should work out of the box for oblique tree classifiers.
cc: @PSSF23 do you want to handle this refactoring inside your current partial_fit PR?
The commit here should enable the partial_fit without really too much effort: https://github.com/neurodata/scikit-learn/pull/54/commits/368df7a381b14952f9cee477a7e81b3768dcd2d0
Think this is now fully enabled by #114.
@PSSF23 lmk how this works. Will be exciting to see this in action.
I found some segmentation fault bug with cython on tree level partial_fit, trying to resolve it.
As of now this is mostly addressed w/ the PRs in our upstream scikit-learn fork.
However, as discussed offline, there are some issues with regards to seg faults that occur in the check_estimator unit-tests with certain Forests when doing partial_fit. I'll leave this issue open until #118 is fixed.