mrmr Usability within sklearn pipelines

Usability within sklearn pipelines

Open apachaves opened this issue 3 years ago • 2 comments

@smazzanti, thank you for this package and the Medium article explaining MRMR in a very good and clear way. Hope this package can only increase and improve, and hopefully I will give some contribution soon.

I have start to try it out myself and I was wondering what is the intended way to use it from within a sklearn pipeline. For instance, in the F ANOVA example in sklearn's docs, here it how it looks like: https://scikit-learn.org/stable/auto_examples/feature_selection/plot_feature_selection_pipeline.html#sphx-glr-auto-examples-feature-selection-plot-feature-selection-pipeline-py

Would the idea be that we should wrap the mrmr_classif in a SelectKBest object as well to use it as a step in a sklearn Pipeline? If so, how should we control the K parameter inside the cross validation? I saw that SelectKBest has its own k parameter too: https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html#sklearn.feature_selection.SelectKBest

This is just a quick question to understand better the proposal, so forgive me in advance for any mistake or lack of clarity. Hopefully we can keep a healthy discussion.

Wish you all the best!

Apr 19 '21 13:04 apachaves

Hi Anderson,

thank you, I'm glad that you found it useful!

SelectKBest needs a function that returns a score for each of the features. So, yes, you can make the function mrmr_classif return the score rather than the list of selected features, but you will need to modify it a bit.

Hope this helps, Samuele

Il giorno lun 19 apr 2021 alle ore 15:16 Anderson Chaves < @.***> ha scritto:

@smazzanti https://github.com/smazzanti, thank you for this package and the Medium article explaining MRMR in a very good and clear way. Hope this package can only increase and improve, and hopefully I will give some contribution soon.

I have start to try it out myself and I was wondering what is the intended way to use it from within a sklearn pipeline. For instance, in the F ANOVA example in sklearn's docs, here it how it looks like: https://scikit-learn.org/stable/auto_examples/feature_selection/plot_feature_selection_pipeline.html#sphx-glr-auto-examples-feature-selection-plot-feature-selection-pipeline-py

Would the idea be that we should wrap the mrmr_classif in a SelectKBest object as well to use it as a step in a sklearn Pipeline? If so, how should we control the K parameter inside the cross validation? I saw that SelectKBest has its own k parameter too: https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html#sklearn.feature_selection.SelectKBest

This is just a quick question to understand better the proposal, so forgive me in advance for any mistake or lack of clarity. Hopefully we can keep a healthy discussion.

Wish you all the best!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/smazzanti/mrmr/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJZ7NA6HVH56D6NE25UTXLLTJQULNANCNFSM43FZ7H5A .

Apr 19 '21 21:04 smazzanti

All right. I will try to do the modifications and add that as a feature. Cannot guarantee yet I will be able to do it quickly though.

Apr 21 '21 14:04 apachaves

mrmr mrmr copied to clipboard

Usability within sklearn pipelines

mrmr
mrmr copied to clipboard