machine-learning icon indicating copy to clipboard operation
machine-learning copied to clipboard

Median absolute deviation feature selection

Open dhimmel opened this issue 7 years ago • 4 comments

@gwaygenomics presented evidence that median absolute deviation (MAD) feature selection (selecting genes with the highest MADs) can eliminate most features without hurting performance: https://github.com/cognoma/machine-learning/pull/18#issuecomment-236265506. In fact, it appears that performance increased with the feature selection, which could make sense if the selection enriched for predictive features, increasing the signal-to-noise ratio.

Therefore, I think we should investigate this method of feature selection further. Specifically, I'm curious whether:

  • @gwaygenomics' findings hold true for outcomes other than RAS?
  • MAD is better than MAD / median? I think MAD could be biased against selecting genes that are lowly expressed but still variable?
  • MAD outperforms random selection of the same feature set size?
  • MAD performs well for other algorithms besides logistic regression?

I'm labeling this issue a task, so please investigate if you feel inclined.

dhimmel avatar Aug 01 '16 15:08 dhimmel