ml4bio-workshop
ml4bio-workshop copied to clipboard
Modify example datasets
There were a few instances were our sample datasets did not give the desired outcome, which made it hard to impress the points we wanted to make about hyperparameter or model selection:
- Decision tree gave perfect accuracy on the example data
- Logisitic regression did not give a solution with L1 regularization that ignored one of the features
Part of the challenge may be the random data splitting. Do we need to introduce an explicit seed? Would that help introduce reproducibility concepts or complicate the workflow too much?