machine-learning
machine-learning copied to clipboard
Median absolute deviation feature selection
@gwaygenomics presented evidence that median absolute deviation (MAD) feature selection (selecting genes with the highest MADs) can eliminate most features without hurting performance: https://github.com/cognoma/machine-learning/pull/18#issuecomment-236265506. In fact, it appears that performance increased with the feature selection, which could make sense if the selection enriched for predictive features, increasing the signal-to-noise ratio.
Therefore, I think we should investigate this method of feature selection further. Specifically, I'm curious whether:
- @gwaygenomics' findings hold true for outcomes other than RAS?
- MAD is better than MAD / median? I think MAD could be biased against selecting genes that are lowly expressed but still variable?
- MAD outperforms random selection of the same feature set size?
- MAD performs well for other algorithms besides logistic regression?
I'm labeling this issue a task, so please investigate if you feel inclined.