machine-learning
machine-learning copied to clipboard
Decisions required to reach a minimum viable product
We're nearing the point where we'll need to implement a machine learning module to execute user queries. We're looking to create a minimum viable product. We can expand functionality later, but for now let's focus on the simplest and most succinct implementation. There are several decisions to make:
- Classifier: which classifiers should we support? If we want to support only a single classifier for now, which one?
- Predictions: do we want to return probabilities, scores, or class predictions?
- Threshold: do we want to report performance measures that depend on a single classification threshold? Or do we want report performance that span thresholds?
- Testing: Do we want to use a testing partition in addition to cross-validation? If so, do we refit a model on all observations?
- Features Should we include covariates in addition to expression features (see #21)?
- Feature selection: Do we want to perform any feature selection?
- Feature extraction: Do we want to perform features extraction, such as PCA (see #43)?
So let's work out these choices, with a focus on simplicity.
Here are my thoughts:
-
Classifier:
sklearn.linear_model.SGDClassifier
with a grid search to find the optimall1_ratio
andalpha
. See2.TCGA-MLexample.ipynb
for an example. -
Predictions: let's return all three using the following object names
probability
,score
,class
under apredictions
key. The frontend should handle cases whereprobability
is absent. - Threshold: Both.
- Testing: Let's hold out 10% for testing.
- Features deferring this decision based on the maturity of #21.
- Feature selection: let's do MAD feature selection to 8000 genes based on @yl565's findings in https://github.com/cognoma/machine-learning/issues/22#issuecomment-238113032. This should help speed up fitting the elastic net without too much performance loss.
- Feature extraction: deferring this decision based on the maturity of #43.
@gwaygenomics, @yl565, @stephenshank: do you agree?
Can you clarify what you mean by number 3?
Or do we want report performance that span thresholds?
Like AUROC?
By "span thresholds" I'm referring to any measure computed from predicted probabilities/scores, such as AUROC or AUPRC. By "single classification threshold", I'm referring to any measure computed from predicted classes, such as precision, recall, accuracy, or F1 score.
got it. Then yes, this all looks good to me
+1
Sounds good!