boruta_py
boruta_py copied to clipboard
No n_features_to_select parameter
Although I understand that Boruta is, by design, an all-relevant feature selection method, it would be nice to have the option to select a specified number of features.
As of right now, BorutaPy presents ranking 1 through 3 (relevant, tentative, rejected).
I am thinking of looking through the statistical tests and return the ranking by p-value. If you like this issue and have a clear idea how to implement it, let me know.
I am trying to work on it on my fork.
I know this doesnt directly answer your question. When I want to minimize the features I often do a feature reduction after the all relevant feature selection step. Forward or backward stepwise feature elimination depending on whether you want choose very few features or only drop a few respectively. I have also found that some simulated annealing helps a lot in practice.
This might help in practice because highly correlated features will all have high p values. So you might throw out features which are less statistically relevant but have more orthogonal value.
Sorry for the tangent but thought it might help