boruta_py icon indicating copy to clipboard operation
boruta_py copied to clipboard

No n_features_to_select parameter

Open bgalvao opened this issue 4 years ago • 1 comments

Although I understand that Boruta is, by design, an all-relevant feature selection method, it would be nice to have the option to select a specified number of features.

As of right now, BorutaPy presents ranking 1 through 3 (relevant, tentative, rejected).

I am thinking of looking through the statistical tests and return the ranking by p-value. If you like this issue and have a clear idea how to implement it, let me know.

I am trying to work on it on my fork.

bgalvao avatar Jan 26 '21 10:01 bgalvao

I know this doesnt directly answer your question. When I want to minimize the features I often do a feature reduction after the all relevant feature selection step. Forward or backward stepwise feature elimination depending on whether you want choose very few features or only drop a few respectively. I have also found that some simulated annealing helps a lot in practice.

This might help in practice because highly correlated features will all have high p values. So you might throw out features which are less statistically relevant but have more orthogonal value.

Sorry for the tangent but thought it might help

DreHar avatar Jan 26 '21 10:01 DreHar