Python-Machine-Learning icon indicating copy to clipboard operation
Python-Machine-Learning copied to clipboard

General Workshop Improvements

Open stemlock opened this issue 3 years ago • 0 comments

  • Replace Iris dataset
  • There is no baseline model for classification (decision tree?). What about logistic regression?
  • Feel like some sections lack explanations (e.g., feature importances, comparing different algorithms, no ROC curves?)
  • Other types of hyperparameter tuning (RandomSearch, Bayes Search)
  • XGBoost is generally considered the gold standard for shallow learning models. Replace AdaBoost?
  • Code could be cleaned up in general/more comments
  • Regression section would be a great place to introduce general modeling pipelines (data cleaning, feature transformation, feature engineering (maybe not applicable here), model training, hyperparameter tuning/cross-validation, model evaluation)
  • No need for a separate dummyencoder class -> this can be handled using onehotencoder or even Pandas get_dummies
  • If we are going to use a transformer + pipelines, we should think about adding the model object to the pipeline as well. In general, this is a better practice as you can then save off entire model pipelines vs just feature transformation pipelines.
  • I typically see KNN used for more naive classification vs regression. Not sure if it is necessary to include
  • We don't talk about Naive Bayes in classification. I feel this is a canonical algorithm that could be introduced
  • No mention of any dimensionality reduction/latent variable techniques for clustering seems like a gap

stemlock avatar Mar 03 '22 03:03 stemlock