healthcareai-py icon indicating copy to clipboard operation
healthcareai-py copied to clipboard

Multiclass Bonanza

Open Aylr opened this issue 7 years ago • 7 comments

Buckle in. This is a big PR.

Why is this so big?

Multi-class implementation required substantial thought, especially around metrics. We had some excellent groundwork done by @ShufangCi in an effort to implement neural nets. It seemed appropriate to take some time to get the multi-class foundation solid before neural nets.

While this effort was underway, quite a few other bugs and needed enhancements were thrown in (sorry, I normally try to keep PRs as small as possible).

Please see this issue #372 for extra details

Issues closed in this PR

  • #372
  • #144
  • #374
  • #393
  • #416
  • #415

Additional Caveats

  • There is a known bug #414 that will cause 3 tests to fail intermittently. When one fails, they all fail. They are:
    • test_logistic_regression_no_tuning
    • test_random_forest_no_tuning
    • test_random_forest_tuning
  • The example_regression_1.py is using the diabetes set that has missing data the BP column. You'll need to fill that to get it to run by adding this line right after you load the dataframe: dataframe['SystolicBPNBR'] = dataframe['SystolicBPNBR'].fillna(149). Note it is silly to predict that column, so we need a better dataset to use for regression examples. I'd love your thoughts on this

What's Included aka What to Try to Break Please

  • [x] New top-level load_dermatology() function that returns a dataframe with a multiclass derm dataset
  • [x] New properties on SupervisedModelTrainer: .class_labels and .number_of_classes
  • [x] Nice printout when training that shows how many and which classes for classification tasks
  • [ ] Confusion matrices available via console TrainedSupervisedModel.print_confusion_matrix() or plots TrainedSupervisedModel.confusion_matrix_plot() #374
  • [ ] More robust metrics that deal with binary or multi-class support better
  • [ ] Changed all classification scoring defaults from roc_auc (binary only) to accuracy (binary and multiclass)

Special Bonus Things to Try to Break Also

  • [ ] Optional binary_positive_label argument for binary classification tasks
  • [ ] Removed need for binary classification tasks to have a 'Y'/'N' in the prediction column. This can now be anything, and healthcareai tries to guess which is the positive class if it is not specified. Positive class is displayed in console output and on ROC/PR plots.
  • [ ] DataframeNullValueFilter now raises a helpful error that identifies columns that are entirely null.
  • [ ] SupervisedModelTrainer now warns users about columns/features with high and low cardinality. #144

Aylr avatar Sep 27 '17 12:09 Aylr

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@4e415b7). Click here to learn what that means. The diff coverage is 39.79%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #386   +/-   ##
=========================================
  Coverage          ?   52.69%           
=========================================
  Files             ?       38           
  Lines             ?     2211           
  Branches          ?        0           
=========================================
  Hits              ?     1165           
  Misses            ?     1046           
  Partials          ?        0
Impacted Files Coverage Δ
healthcareai/common/healthcareai_error.py 40% <ø> (ø)
healthcareai/pipelines/data_preparation.py 83.33% <ø> (ø)
healthcareai/common/helpers.py 96.96% <ø> (ø)
healthcareai/datasets/base.py 45.45% <0%> (ø)
healthcareai/__init__.py 42.85% <0%> (ø)
healthcareai/common/filters.py 29.62% <0%> (ø)
healthcareai/datasets/__init__.py 33.33% <0%> (ø)
healthcareai/tests/test_dataframe_filters.py 59.13% <0%> (ø)
healthcareai/tests/helpers.py 66.66% <100%> (ø)
healthcareai/common/transformers.py 32.45% <25%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4e415b7...65cfb43. Read the comment docs.

codecov-io avatar Sep 27 '17 16:09 codecov-io

If the user specifies the binary_positive_label parameter, I think the probabilities still need to be adjusted. (Setting binary_positive_label = 'N' in the classification example leads to some weird ROC curves.)

I think adding the following lines of code in compute_classification_metrics, right after assigning positive_label should fix this issue:

# If user specified positive_label does not match default, adjust the probabilities
if positive_label == y_test.unique()[0]:
   probability_predictions = 1 - probability_predictions

Here is an example of what I think is going on right now. Suppose I have the following data in the form of ordered pairs (true data, probability of red): (green, 0.3), (red, 0.9), (red, 0.85), (green, 0.2) By default, your code will convert this to (0, 0.3), (1, 0.9), (1, 0.85), (0, 0.2) But if the user sets binary_positive_label = 'green', they will get (1, 0.3), (0, 0.9), (0, 0.85), (1, 0.2) Instead, we want to also convert the probabilities to get (1, 0.7), (0, 0.1), (0, 0.15), (1, 0.8)

yvanhuele avatar Sep 27 '17 20:09 yvanhuele

My first pass through this.

  • Ran the examples, they look great. I love the confusion matrix plot.
  • Loaded up a dataset with 14 classes and 31k rows.
    • 'NULL' is the missing value, but I get 0% imputation
    • KNN is slooooooooow. It trains by default with neighbors 5:25 and 2 weights! I couldn't wait.
    • RF is slooooooooow. It looks like it's a default of 200 trees and 5 random searches?
    • LR worked great. Fast, 93% accruacy (vs. R's 92%)
  • I'm assuming some of these problems will be fixed with me knowing a bit more about the package.

Minor suggestions:

  • Should the confusion matrix by counts be color coded?. For example, say I misclassify 10/12 for a very small category.
    • On the proportion matrix, it looks very bad. 0.9 wrong! But only 0.01% of the total data. Is there a way to balance that?
    • On the counts matrix, I have no quick way of telling how much of the total it is. Color coding it might help?

mmastand avatar Sep 28 '17 15:09 mmastand

@mmastand Great review!

I've decreased the hyperparameter space by about 2/3, though it was using randomized grid search of 10 iterations, which I've decreased to 5. We should think about choosing sensible values and offering guidance.

The 10x slower on RF is totally expected given that it was running 10 models to explore the hyperparameters. Decreased to 5 by default.

On the non-normalized CM plot, I like the idea of row wise color coding, but that would need some significant explanation, as a user would assume that color changes are applied across a whole graph. Open to experimentation here.

I intentionally did not implement console output on the advanced trainer (MVP) - created new issue as it isn't multiclass specific: #391

Aylr avatar Oct 06 '17 01:10 Aylr

Still fixing some problems that a few tests brought to light.

Aylr avatar Oct 25 '17 13:10 Aylr

  • Confusion matrices available via console TrainedSupervisedModel.print_confusion_matrix() or plots TrainedSupervisedModel.confusion_matrix_plot() Fixed in #374 It does indeed print these from the command line. I like it.

  • More robust metrics that deal with binary or multi-class support better Changed all classification scoring defaults from roc_auc (binary only) to accuracy (binary and multiclass) So this will default to using accuracy for comparison of binary classification ensembles? I could see that being dangerous with imbalanced class cases. #424

  • Optional binary_positive_label argument for binary classification tasks

    • This worked when I flipped it.
    • Worked when I specified it as the correct value.
    • Good job with that error when I put "Naw Dawg" as the positive label.
  • Removed need for binary classification tasks to have a 'Y'/'N' in the prediction column. This can now be anything, and healthcareai tries to guess which is the positive class if it is not specified. Positive class is displayed in console output and on ROC/PR plots. The following assigns Waffles as the positive label and somehow prints the ROC curve right side up:

dataframe['ThirtyDayReadmitFLG'].replace('Y', 'SnoCones', inplace=True)
dataframe['ThirtyDayReadmitFLG'].replace('N', 'Waffles', inplace=True)

Specifying binary_positive_label='SnoCones' prints the ROC curve upside down. Adding a third target column goes out to multiclass. Like it. Good error when calling ROC. CM methods work, but algorithm appears not to:

dataframe['ThirtyDayReadmitFLG'].replace('Y', 'SnoCones', inplace=True)
dataframe['ThirtyDayReadmitFLG'].replace('N', 'Waffles', inplace=True)
dataframe.loc[0:5, 'ThirtyDayReadmitFLG'] = "Omelette"

classification_trainer = healthcareai.SupervisedModelTrainer(
   dataframe=dataframe,
   predicted_column='ThirtyDayReadmitFLG',
   model_type='classification',
   grain_column='PatientEncounterID',
   impute=True,
   verbose=False)

Yields:

    - LogisticRegression selected performance metrics:
    accuracy: 0.85

		Confusion Matrix (Counts)
		    - Predicted Classes are along the top
		    - True Classes are along the left.

		            SnoCones    Waffles    Omelette
		--------  ----------  ---------  ----------
		SnoCones           0         27           0
		Waffles            0        171           0
		Omelette           0          2           0

#435

  • DataframeNullValueFilter now raises a helpful error that identifies columns that are entirely null. Added a column of nulls. Errored out as expected. But first, it imputed 2094.34% of my data. What the what? Fixed in #429

  • SupervisedModelTrainer now warns users about columns/features with high and low cardinality. Love the warnings. Very clear.

Other things I did

  • Ran the multiclass example. RF is still very slow in the example. If you can't figure it out, I'd comment out RF/ensemble so that people can run the example.

  • Typo in the imputation error methods: Numeric imputation will always occur when makingpredictions on new data Fixed in #429

  • I wrote a multiclass_advanced example that runs with RF. You have to specify a smaller number of grid features to search over and it seems to work just fine.

Factor Testing Stuff

  • Checked that a single row prediction works. My deploy dataset does not have the target column in the dataset. That throws a `Warning! Missing category: NaN. This warning should ignore the target column. Fixed in #433
  • If I add the target column in the dataset with any value in it, the extra category is correclty parsed.
  • Added new factor levels to a column. developSet contained (Hot, Cold), deploySet contained (Frozen, NaN, Cold). I got the warning about new categories for both Frozen and nan. I think that this warning should not flag nan as new categories. #434
  • A single row prediction with NA values does not work, as the column has no variance and is removed.
  • Trained a multiclass model on :300 of dermatology and deployed on 301:. Throws an error in prediction: ValueError: Length of values does not match length of index. Am I missing something? How is it figuring out what potential categories to pick from? #432

mmastand avatar Nov 03 '17 18:11 mmastand

This is all done except for #424. Ping me when you're ready for review.

mmastand avatar Dec 11 '17 18:12 mmastand