automlbenchmark icon indicating copy to clipboard operation
automlbenchmark copied to clipboard

`hyperoptsklearn:latest` seems to not run correctly

Open eddiebergman opened this issue 3 years ago • 1 comments

I think the setup for integrating with hyperoptsklearn may be broken. I've ran this for longer but I assume it's some setup code.

python runbenchmark.py -f 0 -s auto hyperoptsklearn:latest example
CTRL+C

id, task, fold, ...
... AttributeError: 'str' object has no attribute 'shape' ...

eddiebergman avatar Oct 03 '21 19:10 eddiebergman

very possible, I don't think anyone has tried hyperoptsklearn for a long time now. It's not even included in our validation test suite.

Thanks for raising this, we'll look at it.

sebhrusen avatar Oct 03 '21 19:10 sebhrusen

This:

yes | python runbenchmark.py hyperoptsklearn example test -f 0 -m docker -s force

is working fine.

Now, from 20 frameworks, this hyperoptsklearn is the only one failing with my data for a reason I cannot understand:

yes | python3 runbenchmark.py hyperoptsklearn automl_config_docker 1h4c -m docker -i .
...
-----------------------------------------------------------------------
Starting job local.automl_config_docker.1h4c.teddata.0.hyperoptsklearn.
Assigning 4 cores (total=128) for new task teddata.
Assigning 144933 MB (total=257672 MB) for new teddata task.
[MONITORING] [local.automl_config_docker.1h4c.teddata.0.hyperoptsklearn] CPU Utilization: 8.5%
[MONITORING] [local.automl_config_docker.1h4c.teddata.0.hyperoptsklearn] Memory Usage: 43.0%
[MONITORING] [local.automl_config_docker.1h4c.teddata.0.hyperoptsklearn] Disk Usage: 99.1%
Using training set /input/test_data/differentiate_cancer_train.arff with test set /input/test_data/differentiate_cancer_test.arff.
Running task teddata on framework hyperoptsklearn with config:
TaskConfig({'framework': 'hyperoptsklearn', 'framework_params': {}, 'framework_version': 'latest', 'type': 'classification', 'name': 'teddata', 'fold': 0, 'metric': 'auc', 'metrics': ['auc', 'logloss', 'acc', 'balacc'], 'seed': 1547281647, 'job_timeout_seconds': 800, 'max_runtime_seconds': 400, 'cores': 4, 'max_mem_size_mb': 144933, 'min_vol_size_mb': -1, 'input_dir': '/input', 'output_dir': '/output/', 'output_predictions_file': '/output/predictions/teddata/0/predictions.csv', 'ext': {}, 'type_': 'binary', 'output_metadata_file': '/output/predictions/teddata/0/metadata.json'})
Running cmd `/bench/frameworks/hyperoptsklearn/venv/bin/python -W ignore /bench/frameworks/hyperoptsklearn/exec.py`
INFO:__main__:
**** Hyperopt-sklearn [vlatest] ****

WARNING:__main__:Ignoring cores constraint of 4 cores.
INFO:__main__:Running hyperopt-sklearn with a maximum time of 400s on all cores, optimizing auc.

  0%|          | 0/1 [00:00<?, ?trial/s, best loss=?]ERROR:hyperopt.fmin:job exception: Only one class present in y_true. ROC AUC score is not defined in that case.

  0%|          | 0/1 [00:00<?, ?trial/s, best loss=?]
ERROR:frameworks.shared.callee:Only one class present in y_true. ROC AUC score is not defined in that case.

Traceback (most recent call last):

  File "/bench/frameworks/shared/callee.py", line 70, in call_run

    result = run_fn(ds, config)

  File "/bench/frameworks/hyperoptsklearn/exec.py", line 80, in run

    estimator.fit(X_train, y_train)

  File "/bench/frameworks/hyperoptsklearn/lib/hyperopt-sklearn/hpsklearn/estimator/estimator.py", line 464, in fit

    fit_iter.send(increment)

  File "/bench/frameworks/hyperoptsklearn/lib/hyperopt-sklearn/hpsklearn/estimator/estimator.py", line 339, in fit_iter

    hyperopt.fmin(_fn_with_timeout,

  File "/bench/frameworks/hyperoptsklearn/venv/lib/python3.8/site-packages/hyperopt/fmin.py", line 540, in fmin

    return trials.fmin(

  File "/bench/frameworks/hyperoptsklearn/venv/lib/python3.8/site-packages/hyperopt/base.py", line 671, in fmin

    return fmin(

  File "/bench/frameworks/hyperoptsklearn/venv/lib/python3.8/site-packages/hyperopt/fmin.py", line 586, in fmin

    rval.exhaust()

  File "/bench/frameworks/hyperoptsklearn/venv/lib/python3.8/site-packages/hyperopt/fmin.py", line 364, in exhaust

    self.run(self.max_evals - n_done, block_until_done=self.asynchronous)

  File "/bench/frameworks/hyperoptsklearn/venv/lib/python3.8/site-packages/hyperopt/fmin.py", line 300, in run

    self.serial_evaluate()

  File "/bench/frameworks/hyperoptsklearn/venv/lib/python3.8/site-packages/hyperopt/fmin.py", line 178, in serial_evaluate

    result = self.domain.evaluate(spec, ctrl)

  File "/bench/frameworks/hyperoptsklearn/venv/lib/python3.8/site-packages/hyperopt/base.py", line 892, in evaluate

    rval = self.fn(pyll_rval)

  File "/bench/frameworks/hyperoptsklearn/lib/hyperopt-sklearn/hpsklearn/estimator/estimator.py", line 311, in _fn_with_timeout

    raise fn_rval[1]

ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.
...

The only hint I found with google couldn't help me much.

Any idea/suggestion is very much appreciated.

alanwilter avatar Sep 21 '22 18:09 alanwilter

I am not familiar with that particular problem, but maybe I can help diagnose the issue. Is your data heavily imbalanced or ordered? It sounds like it might be hyperoptsklearn using an internal validation procedure which might generate an all-positive (or negative) test fold. Looking at the source it looks like the default implementation uses the last 20% of the data as validation fold. In the case of ordered data, it's quite possible the last 20% all exist of the same class which might lead to this error. To verify this is the issue, you can shuffle the data and try again if the data is ordered, or oversample the minority classes in case the dataset in very imbalanced.

PGijsbers avatar Sep 26 '22 12:09 PGijsbers

Fantastic @PGijsbers! I learnt about bash shuf and once I shuffle my input data it did work nicely with hyperoptsklearn. You can close this ticket. Thanks!

alanwilter avatar Sep 27 '22 17:09 alanwilter