benchm-ml icon indicating copy to clipboard operation
benchm-ml copied to clipboard

Added AutoML

Open earino opened this issue 7 years ago • 7 comments

As we've discussed in Slack, H2O has recently released some very interesting AutoML functionality. In this case, the leader is the StackedEnsemble generated from a GBM grid, a DL grid, a DRF and an XRT model. On 100k records it trained for a while on some small cloud hardware, and generated a respectable AUC of 0.7284624

> md
An object of class "H2OAutoML"
Slot "project_name":
[1] "<default>"

Slot "leader":
Model Details:
==============

H2OBinomialModel: stackedensemble
Model ID:  StackedEnsemble_model_1496028880431_2818 
NULL


H2OBinomialMetrics: stackedensemble
** Reported on training data. **

MSE:  0.06495612
RMSE:  0.2548649
LogLoss:  0.2435769
Mean Per-Class Error:  0.07056041
AUC:  0.9872952
Gini:  0.9745905

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
           N     Y    Error         Rate
N      54777  1849 0.032653  =1849/56626
Y       1450 11918 0.108468  =1450/13368
Totals 56227 13767 0.047133  =3299/69994

Maximum Metrics: Maximum metrics at their respective thresholds
                        metric threshold    value idx
1                       max f1  0.299564 0.878423 218
2                       max f2  0.243801 0.912848 242
3                 max f0point5  0.362489 0.896238 193
4                 max accuracy  0.313673 0.953653 213
5                max precision  0.974294 1.000000   0
6                   max recall  0.132957 1.000000 309
7              max specificity  0.974294 1.000000   0
8             max absolute_mcc  0.299564 0.849339 218
9   max min_per_class_accuracy  0.253667 0.943118 237
10 max mean_per_class_accuracy  0.247323 0.944984 240

Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
H2OBinomialMetrics: stackedensemble
** Reported on validation data. **

MSE:  0.1327237
RMSE:  0.3643127
LogLoss:  0.4226191
Mean Per-Class Error:  0.3271404
AUC:  0.7433911
Gini:  0.4867822

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
           N    Y    Error         Rate
N       9287 2974 0.242558  =2974/12261
Y       1166 1666 0.411723   =1166/2832
Totals 10453 4640 0.274299  =4140/15093

Maximum Metrics: Maximum metrics at their respective thresholds
                        metric threshold    value idx
1                       max f1  0.196506 0.445931 257
2                       max f2  0.114152 0.591573 329
3                 max f0point5  0.307013 0.439652 188
4                 max accuracy  0.579457 0.822434  82
5                max precision  0.950060 1.000000   0
6                   max recall  0.048541 1.000000 396
7              max specificity  0.950060 1.000000   0
8             max absolute_mcc  0.272812 0.299325 207
9   max min_per_class_accuracy  0.165504 0.672539 281
10 max mean_per_class_accuracy  0.156244 0.677032 289

Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`


Slot "leaderboard":
                                             model_id      auc  logloss
1            StackedEnsemble_model_1496028880431_2818 0.742023 0.424990
2  GBM_grid__a70036165806366cd146a852765f4af0_model_3 0.724540 0.472045
3  GBM_grid__a70036165806366cd146a852765f4af0_model_1 0.722181 0.438297
4  GBM_grid__a70036165806366cd146a852765f4af0_model_0 0.720750 0.475918
5                           DRF_model_1496028880431_4 0.718733 0.471836
6                         XRT_model_1496028880431_366 0.718564 0.439938
7   DL_grid__a70036165806366cd146a852765f4af0_model_0 0.715729 0.453427
8   DL_grid__a70036165806366cd146a852765f4af0_model_1 0.715312 0.453516
9  GBM_grid__a70036165806366cd146a852765f4af0_model_8 0.712989 0.443795
10 GBM_grid__a70036165806366cd146a852765f4af0_model_4 0.711725 0.457926
11  DL_grid__a70036165806366cd146a852765f4af0_model_2 0.711247 0.472706
12 GLM_grid__a70036165806366cd146a852765f4af0_model_0 0.709769 0.443991
13 GLM_grid__a70036165806366cd146a852765f4af0_model_1 0.709769 0.443991
14 GBM_grid__a70036165806366cd146a852765f4af0_model_6 0.705461 0.468157
15 GBM_grid__a70036165806366cd146a852765f4af0_model_2 0.703969 0.444650
16 GBM_grid__a70036165806366cd146a852765f4af0_model_5 0.697802 0.483724
17  DL_grid__a70036165806366cd146a852765f4af0_model_4 0.691404 0.497545
18 GBM_grid__a70036165806366cd146a852765f4af0_model_7 0.668311 0.897990
19  DL_grid__a70036165806366cd146a852765f4af0_model_3 0.658246 0.647369

earino avatar May 29 '17 07:05 earino

Ensembles (the new Java implementation) + AutoML has been on my list to look at (I already did some).

However, I think I should keep this repo with the basic algos only and create new repos for looking at things build on top of those (also 99% of the training time in ensembles/autoML is spend in the building blocks, so there is no much to benchmark on speed, while the increase in AUC will be very much dataset dependent).

I already included ensembles in the course I'm teaching at UCLA, see here.

I might create a repo for autoML, thought that's also trivial, the code above changed 2 lines vs original. I would probably run it on 1M records though.

I actually already factored out GBMs from this benchmark in order to keep track with the newest best tools (added LightGBM) and forget about mediocre tools such as Spark. This new repo will have a more targeted focus (only 1M/10M records and only best GBM tools), but I might be able to update it with new versions more regularly (+add GPUs).

szilard avatar May 30 '17 03:05 szilard

PS: I also started a deep learning repo a few months ago, but did not get too far (yet).

szilard avatar May 30 '17 03:05 szilard

following @ledell's advice, the code gives an AUC of 0.7286668 so some enhancement but not drastic on the 100k row dataset. I'm running it on the 1M overnight.

earino avatar May 30 '17 04:05 earino

@earino How long did you run it for? If it was the default, then it probably ran for 10 minutes. We changed the default to 1 hour very recently, so if you re-run on a newer version, you should make a note of the change. In your results above, it looks like StackedEnsemble_model_1496028880431_2818 had a test AUC of ~0.74, not ~0.72...?

ledell avatar May 30 '17 05:05 ledell

I'm running off the nightly build I believe? Or at least very recent. This is the exact run, it took 1 hour 1 minute and 16 seconds @ledell -> https://app.dominodatalab.com/u/earino/AutoML/runs/592cf961f5f40862c7badf99

It's the output of h2o.performance that I'm looking at.

On Mon, May 29, 2017, 10:54 PM Erin LeDell [email protected] wrote:

@earino https://github.com/earino How long did you run it for? If it was the default, that's 10 minutes. We changed the default to 1 hour recently, so if you re-build you should make a note of the change. In your results above, it looks like StackedEnsemble_model_1496028880431_2818 had a test AUC of ~0.74, not ~0.72...?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/szilard/benchm-ml/pull/51#issuecomment-304781636, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMumfXzpu9j7FiwL4pejjr5xuBB9tvks5r-68igaJpZM4No-zF .

earino avatar May 30 '17 14:05 earino

@ledell very explicitly, this is the exact line i'm using to get the performance number. Is it the wrong thing? print(h2o.auc(h2o.performance(md@leader, dx_test)))

earino avatar May 30 '17 17:05 earino

@earino That line will also work, but it requires re-computing all the performance metrics on the test set. They are already computed as part of the h2o.automl() function and stored in the Leaderboard.

ledell avatar May 31 '17 20:05 ledell