FES
FES copied to clipboard
Feature selection: potentially add stability selection + knockoffs, use predictive measure for variable scoring instead of p-values
-
It would be nice to include sections on stability selection + knockoffs + cousins / work in that universe. I don't know much about this but it's a big topic of late, especially in the high dimensional ML community. Nice overview (probably a bit out of date by now?) at https://www.stat.cmu.edu/~ryantibs/journalclub/stability.pdf.
-
The Simple Filters section encourages feature selection based on p-values from some sort of GLM / GAM / etc. While this is a standard approach, significant features are not necessarily predictive (see http://biorxiv.org/lookup/doi/10.1101/327437 for example). Scoring predictors based on actual predictive measure seems like a better recommendation (LOOCV / PRESS for people who don't want have time for randomization tests, or resampled error estimates / permuted LOOCV for people who do).
Either way, it would interesting to see simulations comparing various feature "scores" for their efficacy in selecting features via simple filters.