FES Feature selection: potentially add stability selection + knockoffs, use predictive measure for variable scoring instead of p-values

Feature selection: potentially add stability selection + knockoffs, use predictive measure for variable scoring instead of p-values

Open alexpghayes opened this issue 5 years ago • 0 comments

It would be nice to include sections on stability selection + knockoffs + cousins / work in that universe. I don't know much about this but it's a big topic of late, especially in the high dimensional ML community. Nice overview (probably a bit out of date by now?) at https://www.stat.cmu.edu/~ryantibs/journalclub/stability.pdf.
The Simple Filters section encourages feature selection based on p-values from some sort of GLM / GAM / etc. While this is a standard approach, significant features are not necessarily predictive (see http://biorxiv.org/lookup/doi/10.1101/327437 for example). Scoring predictors based on actual predictive measure seems like a better recommendation (LOOCV / PRESS for people who don't want have time for randomization tests, or resampled error estimates / permuted LOOCV for people who do).

Either way, it would interesting to see simulations comparing various feature "scores" for their efficacy in selecting features via simple filters.

Feb 26 '19 04:02 alexpghayes