scikit-learn-intelex
scikit-learn-intelex copied to clipboard
[RF] Random Forest hangs during fit on KDDCup09-Upselling
Describe the bug scikit-learn-intelex Random Forest hangs on KDDCup09 dataset (Large binary classification 2GB dataset in AutoMLBenchmark)
- 45000 rows
- 1260 float columns
- 6223 int columns
- 5797 boolean columns
- 157 category columns
To Reproduce Steps to reproduce the behavior:
- Fit KDDCup09-Upselling dataset with intelex Random Forest
- Process will hang / take very long to train
Hyperparameters:
params = {
'n_estimators': 300,
'n_jobs': -1,
'random_state': 0,
}
Note: You likely need to preprocess the dataset before sending it to RF. I am using AutoGluon to do this automatically as I'm testing scikit-learn-intelex RF integration. This may be easier to test in a couple weeks after AutoGluon v0.4.0 releases, as it will include a toggle to enable intelex RF.
Expected behavior If not using intelex and just sklearn (with same hyperparameters), the model trains quickly and has no issues:
[INFO] [amlb.print:22:38:38.512] Fitting model: RandomForestGini ... Training model for up to 2729.8s of the 2728.42s of remaining time.
[INFO] [amlb.utils.process:22:39:43.046] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 55.4%
[INFO] [amlb.utils.process:22:39:43.046] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 47.4%
[INFO] [amlb.utils.process:22:39:43.047] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.8%
[INFO] [amlb.print:22:40:12.417] 0.9097 = Validation score (roc_auc)
[INFO] [amlb.print:22:40:12.417] 93.4s = Training runtime
[INFO] [amlb.print:22:40:12.417] 0.42s = Validation runtime
Output/Screenshots
[INFO] [amlb.print:01:54:30.498] Fitting model: RandomForestGini ... Training model for up to 2736.42s of the 2735.17s of remaining time.
[INFO] [amlb.print:01:54:54.145] sklearn.ensemble.RandomForestClassifier.fit: running accelerated version on CPU
[INFO] [amlb.utils.process:01:55:52.149] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 41.1%
[INFO] [amlb.utils.process:01:55:52.149] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:01:55:52.150] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:01:57:52.150] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:01:57:52.150] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:01:57:52.151] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:01:59:52.151] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:01:59:52.151] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:01:59:52.152] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:01:52.152] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:01:52.152] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:01:52.153] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:03:52.153] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:03:52.153] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:03:52.154] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:05:52.154] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:05:52.154] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:05:52.155] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:07:52.155] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:07:52.156] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:07:52.156] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:09:52.156] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:09:52.157] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:09:52.157] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:11:52.157] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:11:52.158] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:11:52.158] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:13:52.158] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:13:52.159] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:13:52.159] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:15:52.159] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:15:52.160] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:15:52.160] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:17:52.160] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:17:52.160] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:17:52.161] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:19:52.161] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:19:52.161] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:19:52.162] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:21:52.162] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:21:52.162] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:21:52.162] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:23:52.163] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:23:52.163] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:23:52.163] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:25:52.164] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:25:52.164] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:25:52.164] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:27:52.165] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:27:52.165] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:27:52.165] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:29:52.166] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:29:52.166] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:29:52.166] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:31:52.167] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:31:52.167] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:31:52.167] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:33:52.168] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:33:52.168] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:33:52.168] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:35:52.169] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:35:52.169] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:35:52.169] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:37:52.170] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:37:52.170] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:37:52.170] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:39:52.170] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:39:52.171] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:39:52.171] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:41:52.171] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:41:52.172] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:41:52.172] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:43:52.172] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:43:52.173] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:43:52.173] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:45:52.173] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:45:52.174] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:45:52.174] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:47:52.174] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:47:52.175] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:47:52.175] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:49:52.175] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:49:52.176] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:49:52.176] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:51:52.176] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:51:52.177] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:51:52.177] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:53:52.177] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:53:52.178] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:53:52.178] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:55:52.178] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:55:52.179] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:55:52.179] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:57:52.179] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:57:52.179] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:57:52.180] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:02:59:52.180] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:02:59:52.180] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:02:59:52.181] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:01:52.181] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:01:52.181] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:01:52.181] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:03:52.182] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:03:52.182] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:03:52.182] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:05:52.183] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:05:52.183] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:05:52.183] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:07:52.184] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:07:52.184] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:07:52.184] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:09:52.185] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:09:52.185] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:09:52.185] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:11:52.186] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:11:52.186] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:11:52.186] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:13:52.187] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:13:52.187] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:13:52.187] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:15:52.188] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:15:52.188] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:15:52.188] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:17:52.188] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:17:52.189] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:17:52.189] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:19:52.190] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:19:52.190] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:19:52.190] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:21:52.191] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:21:52.191] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:21:52.191] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:23:52.192] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:23:52.192] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:23:52.192] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:25:52.193] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:25:52.193] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:25:52.193] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:27:52.194] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:27:52.194] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:27:52.194] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:29:52.195] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:29:52.195] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:29:52.195] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:31:52.196] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:31:52.196] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.4%
[INFO] [amlb.utils.process:03:31:52.196] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[INFO] [amlb.utils.process:03:33:52.197] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] CPU Utilization: 50.0%
[INFO] [amlb.utils.process:03:33:52.197] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Memory Usage: 55.5%
[INFO] [amlb.utils.process:03:33:52.197] [MONITORING] [local.ag.1h8c.KDDCup09-Upselling.0.AutoGluon] Disk Usage: 14.9%
[WARNING] [amlb.utils.process:03:34:51.706] Interrupting thread MainThread [ident=140557599672128] after 7500s timeout.
[WARNING] [amlb.utils.process:03:35:51.706] Interrupting thread MainThread [ident=140557599672128] after 7500s timeout.
[ERROR] [amlb.benchmark:03:35:51.707] Interrupting thread MainThread [ident=140557599672128] after 7500s timeout.
Environment:
- OS: Linux
- Compiler: Unknown
- Version: 2021.5.1 (Python 3.7)
- Instance: m5.2xlarge EC2 : 32 GB memory, 8 virtual CPU cores
pip freeze:
absl-py==1.0.0
aiohttp==3.8.1
aiosignal==1.2.0
antlr4-python3-runtime==4.8
async-timeout==4.0.2
asynctest==0.13.0
attrs==21.4.0
autocfg==0.0.8
-e git+https://github.com/awslabs/autogluon.git@f131eff34ead011bfca2f2f985919ad774cd23d3#egg=autogluon&subdirectory=autogluon
autogluon-contrib-nlp==0.0.1b20220208
-e git+https://github.com/awslabs/autogluon.git@f131eff34ead011bfca2f2f985919ad774cd23d3#egg=autogluon.common&subdirectory=common
-e git+https://github.com/awslabs/autogluon.git@f131eff34ead011bfca2f2f985919ad774cd23d3#egg=autogluon.core&subdirectory=core
-e git+https://github.com/awslabs/autogluon.git@f131eff34ead011bfca2f2f985919ad774cd23d3#egg=autogluon.features&subdirectory=features
-e git+https://github.com/awslabs/autogluon.git@f131eff34ead011bfca2f2f985919ad774cd23d3#egg=autogluon.tabular&subdirectory=tabular
-e git+https://github.com/awslabs/autogluon.git@f131eff34ead011bfca2f2f985919ad774cd23d3#egg=autogluon.text&subdirectory=text
-e git+https://github.com/awslabs/autogluon.git@f131eff34ead011bfca2f2f985919ad774cd23d3#egg=autogluon.vision&subdirectory=vision
blis==0.7.6
boto3==1.21.12
botocore==1.24.12
cachetools==5.0.0
catalogue==2.0.6
catboost==1.0.4
certifi==2021.10.8
charset-normalizer==2.0.12
click==8.0.4
cloudpickle==2.0.0
colorama==0.4.4
contextvars==2.4
cycler==0.11.0
cymem==2.0.6
daal==2021.5.3
daal4py==2021.5.3
dask==2021.11.2
Deprecated==1.2.13
distributed==2021.11.2
fairscale==0.4.5
fastai==2.5.3
fastcore==1.3.29
fastdownload==0.0.5
fastprogress==1.0.2
filelock==3.6.0
flake8==4.0.1
fonttools==4.29.1
frozenlist==1.3.0
fsspec==2022.2.0
future==0.18.2
gluoncv==0.10.4.post4
google-auth==2.6.0
google-auth-oauthlib==0.4.6
graphviz==0.19.1
grpcio==1.44.0
HeapDict==1.0.1
huggingface-hub==0.4.0
idna==3.3
imageio==2.16.1
immutables==0.16
importlib-metadata==4.2.0
importlib-resources==5.4.0
Jinja2==3.0.3
jmespath==0.10.0
joblib==1.1.0
jsonschema==4.4.0
kiwisolver==1.3.2
langcodes==3.3.0
lightgbm==3.3.2
locket==0.2.1
Markdown==3.3.4
MarkupSafe==2.1.0
matplotlib==3.5.1
mccabe==0.6.1
msgpack==1.0.3
multidict==6.0.2
murmurhash==1.0.6
networkx==2.6.3
nptyping==1.4.4
numpy==1.21.0
oauthlib==3.2.0
omegaconf==2.1.1
opencv-python==4.5.5.62
packaging==21.3
pandas==1.3.5
partd==1.2.0
pathy==0.6.1
Pillow==9.0.1
pip==22.0.3
plotly==5.6.0
portalocker==2.4.0
preshed==3.0.6
protobuf==3.19.4
psutil==5.8.0
pyarrow==4.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.8.0
pydantic==1.8.2
pyDeprecate==0.3.1
pyflakes==2.4.0
pyparsing==3.0.7
pyrsistent==0.18.1
python-dateutil==2.8.2
pytorch-lightning==1.5.10
pytz==2021.3
PyWavelets==1.2.0
PyYAML==6.0
ray==1.8.0
redis==4.1.4
regex==2022.3.2
requests==2.27.1
requests-oauthlib==1.3.1
rsa==4.8
ruamel.yaml==0.17.4
ruamel.yaml.clib==0.2.2
s3transfer==0.5.2
sacrebleu==2.0.0
sacremoses==0.0.47
scikit-image==0.19.2
scikit-learn==1.0.2
scikit-learn-intelex==2021.5.3
scipy==1.7.3
sentencepiece==0.1.95
setuptools==59.5.0
six==1.16.0
smart-open==5.2.1
sortedcontainers==2.4.0
spacy==3.2.3
spacy-legacy==3.0.9
spacy-loggers==1.0.1
srsly==2.4.2
tabulate==0.8.9
tbb==2021.5.1
tblib==1.7.0
tenacity==8.0.1
tensorboard==2.8.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
thinc==8.0.13
threadpoolctl==3.1.0
tifffile==2021.11.2
timm==0.5.4
timm-clean==0.4.12
tokenizers==0.11.6
toolz==0.11.2
torch==1.10.2
torchmetrics==0.7.2
torchvision==0.11.3
tornado==6.1
tqdm==4.63.0
transformers==4.16.2
typer==0.4.0
typing-extensions==3.10.0.2
typish==1.9.3
urllib3==1.26.8
wasabi==0.9.0
Werkzeug==2.0.3
wheel==0.37.1
wrapt==1.13.3
xgboost==1.4.2
yacs==0.1.8
yarl==1.7.2
zict==2.1.0
zipp==3.7.0
Hi @Innixma, thanks for creating an issue, the problem will be analyzed.
Hey @Innixma I gave it a try and trained a RF on this dataset. I do not see the issue you are reporting. I'm measuring 11m41s training time for the optimized version vs. 10m54s on stock. The performance drop is already reported in https://github.com/intel/scikit-learn-intelex/issues/1050.
Keep in mind that you're running on a huge dataset. I'm using a machine with 250GB memory, out of which 50 GB were allocated at the time of running the tests. It's very likely that the 32 GB of your EC2 instance just don't cut it.
Thanks @ahuber21! I will revisit this topic once the accuracy drop issue has been resolved, as I can run the benchmark again and see if it still occurs. It is reasonable to suspect it is a memory issue, and if it occurs again I'll play around with different sized instances to see where the breaking point is and if native RF uses more/less memory than scikit-learn-intelex RF.
Hey @Innixma, the accuracy drop is understood and stems from an optimization we do for performance that does not translate to all use cases. I am working on an API change that will give the user more control so they can trade performance vs. accuracy themselves. But for now, you could running on KDDCup09-Upselling again with 2023.1.1 to see if the performance improved. In a second run, you could modify https://github.com/intel/scikit-learn-intelex/blob/master/daal4py/sklearn/ensemble/_forest.py#L254 and set memorySavingMode=True
. This will cause sklearnex to fall back to the default scikit-learn algorithm, which is a bit slower, but also more exact.
As I said, we're working on an update to make this less messy and more transparent.
Thats great to hear @ahuber21! I will plan to test it once it is part of an official release. We are working on automating our benchmarking logic, which should enable us to test these options relatively easily once it is available in the next few months.
We are planning to release AutoGluon v1.0 by EOY. I think our team will focus our efforts on determining which backend we use for each model type prior to v1.0 release, probably starting around July, and at that point we will do a deep dive comparing native scikit-learn with scikit-learn-intelex and potentially other packages. At minimum this comparison would include RandomForest, ExtraTrees, KNearestNeighbors, & LinearRegression/LogisticRegression, but could potentially be more.
The things we are looking for to determine which backend to use will be in the following priority:
- Accuracy
- Stability & Coverage (aka doesn't crash, can handle expected data formats, doesn't lead to unexpected OOM, doesn't introduce significant limitations)
- Dependency Burden (aka additional size of
pip install
requirements for using the backend - Training Speed
- OS Support
- Inference Speed (the reason this is lower now is that we are optimized to the point of being very fast in inference already, but of course the faster the better for real-time inference scenarios)
- Artifact Size (aka model size on disk after training)