automlbenchmark
automlbenchmark copied to clipboard
`hyperoptsklearn:latest` seems to not run correctly
I think the setup for integrating with hyperoptsklearn may be broken. I've ran this for longer but I assume it's some setup code.
python runbenchmark.py -f 0 -s auto hyperoptsklearn:latest example
CTRL+C
id, task, fold, ...
... AttributeError: 'str' object has no attribute 'shape' ...
very possible, I don't think anyone has tried hyperoptsklearn
for a long time now.
It's not even included in our validation test suite.
Thanks for raising this, we'll look at it.
This:
yes | python runbenchmark.py hyperoptsklearn example test -f 0 -m docker -s force
is working fine.
Now, from 20 frameworks, this hyperoptsklearn
is the only one failing with my data for a reason I cannot understand:
yes | python3 runbenchmark.py hyperoptsklearn automl_config_docker 1h4c -m docker -i .
...
-----------------------------------------------------------------------
Starting job local.automl_config_docker.1h4c.teddata.0.hyperoptsklearn.
Assigning 4 cores (total=128) for new task teddata.
Assigning 144933 MB (total=257672 MB) for new teddata task.
[MONITORING] [local.automl_config_docker.1h4c.teddata.0.hyperoptsklearn] CPU Utilization: 8.5%
[MONITORING] [local.automl_config_docker.1h4c.teddata.0.hyperoptsklearn] Memory Usage: 43.0%
[MONITORING] [local.automl_config_docker.1h4c.teddata.0.hyperoptsklearn] Disk Usage: 99.1%
Using training set /input/test_data/differentiate_cancer_train.arff with test set /input/test_data/differentiate_cancer_test.arff.
Running task teddata on framework hyperoptsklearn with config:
TaskConfig({'framework': 'hyperoptsklearn', 'framework_params': {}, 'framework_version': 'latest', 'type': 'classification', 'name': 'teddata', 'fold': 0, 'metric': 'auc', 'metrics': ['auc', 'logloss', 'acc', 'balacc'], 'seed': 1547281647, 'job_timeout_seconds': 800, 'max_runtime_seconds': 400, 'cores': 4, 'max_mem_size_mb': 144933, 'min_vol_size_mb': -1, 'input_dir': '/input', 'output_dir': '/output/', 'output_predictions_file': '/output/predictions/teddata/0/predictions.csv', 'ext': {}, 'type_': 'binary', 'output_metadata_file': '/output/predictions/teddata/0/metadata.json'})
Running cmd `/bench/frameworks/hyperoptsklearn/venv/bin/python -W ignore /bench/frameworks/hyperoptsklearn/exec.py`
INFO:__main__:
**** Hyperopt-sklearn [vlatest] ****
WARNING:__main__:Ignoring cores constraint of 4 cores.
INFO:__main__:Running hyperopt-sklearn with a maximum time of 400s on all cores, optimizing auc.
0%| | 0/1 [00:00<?, ?trial/s, best loss=?]ERROR:hyperopt.fmin:job exception: Only one class present in y_true. ROC AUC score is not defined in that case.
0%| | 0/1 [00:00<?, ?trial/s, best loss=?]
ERROR:frameworks.shared.callee:Only one class present in y_true. ROC AUC score is not defined in that case.
Traceback (most recent call last):
File "/bench/frameworks/shared/callee.py", line 70, in call_run
result = run_fn(ds, config)
File "/bench/frameworks/hyperoptsklearn/exec.py", line 80, in run
estimator.fit(X_train, y_train)
File "/bench/frameworks/hyperoptsklearn/lib/hyperopt-sklearn/hpsklearn/estimator/estimator.py", line 464, in fit
fit_iter.send(increment)
File "/bench/frameworks/hyperoptsklearn/lib/hyperopt-sklearn/hpsklearn/estimator/estimator.py", line 339, in fit_iter
hyperopt.fmin(_fn_with_timeout,
File "/bench/frameworks/hyperoptsklearn/venv/lib/python3.8/site-packages/hyperopt/fmin.py", line 540, in fmin
return trials.fmin(
File "/bench/frameworks/hyperoptsklearn/venv/lib/python3.8/site-packages/hyperopt/base.py", line 671, in fmin
return fmin(
File "/bench/frameworks/hyperoptsklearn/venv/lib/python3.8/site-packages/hyperopt/fmin.py", line 586, in fmin
rval.exhaust()
File "/bench/frameworks/hyperoptsklearn/venv/lib/python3.8/site-packages/hyperopt/fmin.py", line 364, in exhaust
self.run(self.max_evals - n_done, block_until_done=self.asynchronous)
File "/bench/frameworks/hyperoptsklearn/venv/lib/python3.8/site-packages/hyperopt/fmin.py", line 300, in run
self.serial_evaluate()
File "/bench/frameworks/hyperoptsklearn/venv/lib/python3.8/site-packages/hyperopt/fmin.py", line 178, in serial_evaluate
result = self.domain.evaluate(spec, ctrl)
File "/bench/frameworks/hyperoptsklearn/venv/lib/python3.8/site-packages/hyperopt/base.py", line 892, in evaluate
rval = self.fn(pyll_rval)
File "/bench/frameworks/hyperoptsklearn/lib/hyperopt-sklearn/hpsklearn/estimator/estimator.py", line 311, in _fn_with_timeout
raise fn_rval[1]
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.
...
The only hint I found with google couldn't help me much.
Any idea/suggestion is very much appreciated.
I am not familiar with that particular problem, but maybe I can help diagnose the issue. Is your data heavily imbalanced or ordered? It sounds like it might be hyperoptsklearn
using an internal validation procedure which might generate an all-positive (or negative) test fold. Looking at the source it looks like the default implementation uses the last 20% of the data as validation fold. In the case of ordered data, it's quite possible the last 20% all exist of the same class which might lead to this error. To verify this is the issue, you can shuffle the data and try again if the data is ordered, or oversample the minority classes in case the dataset in very imbalanced.
Fantastic @PGijsbers! I learnt about bash shuf
and once I shuffle my input data it did work nicely with hyperoptsklearn
. You can close this ticket. Thanks!