99-ML-Learning-Projects icon indicating copy to clipboard operation
99-ML-Learning-Projects copied to clipboard

[EXE] pt1: Simple Decision Tree exercise, pt2: Pipelines

Open iakovidva opened this issue 5 years ago • 1 comments

Learning Goals

Part 1:

  • Work with scikit-learn library, train-test set split, report different scores.
  • Decision Trees.

Part 2:

  • Work with Pipelines (with DecisionTrees), imputers, scalers and encoders.
  • Grid Search.

Exercise Statement

Part 1: Apply different Decision Trees to train a model for detecting breast cancer using the breast-cancer-wisconsin-diagnostic-dataset (scikit-learn 7.2.7. Breast cancer wisconsin (diagnostic) dataset). Goal is to predict whether breast cancer is Malignant or Bening.

Part 2: Apply various transformations, imputers, encoders-scalers using Pipelines with DecisionTreeClassifiers. Work with gridsearch to find the best parameters. Goal is to predict whether income exceeds $50K/yr based on census data.

Prerequisites

DecisionTreeClassifier Pipeline SimpleImputer StandardScaler OneHotEncoder ColumnTransformer GridSearchCV

Data source/summary:

Part 1: 569 instances with 30 numeric attributes. Class distribution: 212 - Malignant, 357 - Benign Follow the link below for the full description of the dataset. https://scikit-learn.org/stable/datasets/#breast-cancer-wisconsin-diagnostic-dataset

Part 2: income.csv is used for training set. 32561 instances with 14 attributes, 6 numeric (e.x. age, capital gain, hours-per-week ) and 8 categorical (e.x. workclass, education, race).

income_test.csv is used for testing and report scores. 15315 instances with 14 attributes, 6 numeric (e.x. age, capital gain, hours-per-week ) and 8 categorical (e.x. workclass, education, race).

Goal is to predict whether income exceeds $50K/yr based on census data. Link: http://archive.ics.uci.edu/ml/datasets/Adult

(Optional) Further Links/Credits to Relevant Resources:

This exercise was assigned in the machine learning course at Aristotle University of THessaloniki and the solution was my submission at this.

iakovidva avatar Sep 26 '20 17:09 iakovidva

@iakovidva Great idea ! Please feel free to work on it and create a PR when you are done. Do check out other existing projects and the contributing guidelines to figure out the practice and format of things. Please do let us know if you have any questions. Thanks !

gimseng avatar Sep 28 '20 12:09 gimseng