scikit-tree icon indicating copy to clipboard operation
scikit-tree copied to clipboard

Need to track constant features in multi-view decision trees

Open adam2392 opened this issue 1 year ago • 1 comments

The only difference between our splitters and the ones in scikit-learn are that the ones in scikit-learn leverage an efficient way to track columns that are "constant" with respect to a target y variable. This ensures with 100% chance that any lower-nodes will not split on said constant features.

This can actually affect performance because when max_features is say 0.3, then you might randomly choose 30% of your features and if there is a very high amount of noise, then it's possible at some node depth for some tree, all 30% of those features may be noise and thus result in constant splits. However, currently oblique splitters will still split the samples, rather than stopping.

adam2392 avatar Nov 29 '23 23:11 adam2392

This was first noticed while testing in https://github.com/neurodata/scikit-tree/pull/172

adam2392 avatar Nov 29 '23 23:11 adam2392