automlbenchmark
automlbenchmark copied to clipboard
Why can't I use my GPUs?
Even with docker, it's possible to run with GPUs via https://github.com/NVIDIA/nvidia-docker.
I see that requirements
files don't have pytorch
or tensorflow
. Is it intentional?
I don't find any mention to GPUs in the documentation either.
Or am I missing something?
@alanwilter which framework are you trying to use exactly? Detection and usage of GPUs will depends on the framework.
I see that requirements files don't have pytorch or tensorflow. Is it intentional?
If you mean https://github.com/openml/automlbenchmark/blob/master/requirements.txt, these are the requirements for the amlb
app, not for any of the framework that you can currently run by default. Some of those frameworks may require pytorch
for example and in this case, it's their responsibility to install it—using the frameworks/xxx/setup.sh
—in their dedicated virtual env.
Even with docker, it's possible to run with GPUs via https://github.com/NVIDIA/nvidia-docker.
the docker images we build by default don't include the nvidia drivers, it's something that would need to be done in the framework's setup.sh
if it wants to leverage GPUs.
I'm just testing for the time being. Which framework do you suggest if I want to use/test with our GPUs?
As for docker, we simply add --gpus N
to docker run ...
cmd, assuming, of course, that the container has pytorch/tensorflow etc. installed. (Of course, one needs to read the instructions and setup accordingly)
The nvidia-docker
is there precisely to avoid you mendling with your docker container.
Looking at the Frameworks in detail and I only found:
-
frameworks/MLPlan
:torch>=1.6.0,<1.7.0
Is that so?
I tried to run it at stable-v2
branch, local
and docker
, both failed with an error:
python3 runbenchmark.py MLPlanSKLearn -m docker
...
Download ML-Plan from extern
Successfully built docker image automlbenchmark/mlplan:stable-dev.
----------------------------------------------------------------
Starting job docker.test.test.all_tasks.all_folds.MLPlanSKLearn.
[MONITORING] [docker.test.test.all_tasks.all_folds.MLPlanSKLearn] CPU Utilization: 77.1%
Starting docker: docker run --name mlplansklearn.test.test.docker.20220518T183639.zIBG03NxB4sbsopS82G7nA__ --shm-size=2048M -v /mnt/data/awilter/cache/openml:/input -v /mnt/data/awilter/automlbenchmark/results/mlplansklearn.test.test.docker.20220518T183639:/output -v /home/awilter/.config/automlbenchmark:/custom --rm automlbenchmark/mlplan:stable-dev MLPlanSKLearn test test -Xseed=auto -i /input -o /output -u /custom -s skip -Xrun_mode=docker --session=.
[MONITORING] [docker.test.test.all_tasks.all_folds.MLPlanSKLearn] Memory Usage: 17.9%
Datasets are loaded by default from folder /mnt/data/awilter/cache/openml.
Generated files will be available in folder /mnt/data/awilter/automlbenchmark/results.
[MONITORING] [docker.test.test.all_tasks.all_folds.MLPlanSKLearn] Disk Usage: 84.0%
Running cmd `docker run --name mlplansklearn.test.test.docker.20220518T183639.zIBG03NxB4sbsopS82G7nA__ --shm-size=2048M -v /mnt/data/awilter/cache/openml:/input -v /mnt/data/awilter/automlbenchmark/results/mlplansklearn.test.test.docker.20220518T183639:/output -v /home/awilter/.config/automlbenchmark:/custom --rm automlbenchmark/mlplan:stable-dev MLPlanSKLearn test test -Xseed=auto -i /input -o /output -u /custom -s skip -Xrun_mode=docker --session=`
Unable to find image 'automlbenchmark/mlplan:stable-dev' locally
docker: Error response from daemon: manifest for automlbenchmark/mlplan:stable-dev not found: manifest unknown: manifest unknown.
See 'docker run --help'.
Running cmd `docker kill mlplansklearn.test.test.docker.20220518T183639.zIBG03NxB4sbsopS82G7nA__`
Error response from daemon: Cannot kill container: mlplansklearn.test.test.docker.20220518T183639.zIBG03NxB4sbsopS82G7nA__: No such container: mlplansklearn.test.test.docker.20220518T183639.zIBG03NxB4sbsopS82G7nA__
Job `docker.test.test.all_tasks.all_folds.MLPlanSKLearn` failed with error: Command 'docker run --name mlplansklearn.test.test.docker.20220518T183639.zIBG03NxB4sbsopS82G7nA__ --shm-size=2048M -v /mnt/data/awilter/cache/openml:/input -v /mnt/data/awilter/automlbenchmark/results/mlplansklearn.test.test.docker.20220518T183639:/output -v /home/awilter/.config/automlbenchmark:/custom --rm automlbenchmark/mlplan:stable-dev MLPlanSKLearn test test -Xseed=auto -i /input -o /output -u /custom -s skip -Xrun_mode=docker --session=' returned non-zero exit status 125.
Traceback (most recent call last):
File "/mnt/data/awilter/automlbenchmark/amlb/job.py", line 115, in start
result = self._run()
File "/mnt/data/awilter/automlbenchmark/amlb/runners/container.py", line 108, in _run
self._start_container("{framework} {benchmark} {constraint} {task_param} {folds_param} -Xseed={seed}".format(
File "/mnt/data/awilter/automlbenchmark/amlb/runners/docker.py", line 73, in _start_container
run_cmd(cmd, _capture_error_=False) # console logs are written on stderr by default: not capturing allows live display
File "/mnt/data/awilter/automlbenchmark/amlb/utils/process.py", line 245, in run_cmd
raise e
File "/mnt/data/awilter/automlbenchmark/amlb/utils/process.py", line 219, in run_cmd
completed = run_subprocess(str_cmd if params.shell else full_cmd,
File "/mnt/data/awilter/automlbenchmark/amlb/utils/process.py", line 77, in run_subprocess
raise subprocess.CalledProcessError(retcode, process.args, output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'docker run --name mlplansklearn.test.test.docker.20220518T183639.zIBG03NxB4sbsopS82G7nA__ --shm-size=2048M -v /mnt/data/awilter/cache/openml:/input -v /mnt/data/awilter/automlbenchmark/results/mlplansklearn.test.test.docker.20220518T183639:/output -v /home/awilter/.config/automlbenchmark:/custom --rm automlbenchmark/mlplan:stable-dev MLPlanSKLearn test test -Xseed=auto -i /input -o /output -u /custom -s skip -Xrun_mode=docker --session=' returned non-zero exit status 125.
All jobs executed in 1.646 seconds.
[MONITORING] [docker.test.test.all_tasks.all_folds.MLPlanSKLearn] CPU Utilization: 65.3%
[MONITORING] [docker.test.test.all_tasks.all_folds.MLPlanSKLearn] Memory Usage: 17.8%
[MONITORING] [docker.test.test.all_tasks.all_folds.MLPlanSKLearn] Disk Usage: 83.9%
Essentially, the docker image automlbenchmark/mlplan:stable-dev
is not created.
The reason MLPlanSKLearn
is failing (and any based on MLPLan) is because mlplan.org
is down.
File mlplan.zip
is never downloaded. Perhaps this framework setup need some update. Perhaps more here? https://mavenlibs.com/maven/dependency/ai.libs/mlplan-full
Anyway, all I wanted is an example framework that would use GPU.
@mwever does the installation script need an update or will Mlplan.org be back up?
Oh, @fmohr is maintaining this server for mlplan.org, I will notify him abou the outage but it should be back up soon.
Regarding use of GPU: ML-Plan does not make use of GPU resources. Honestly speaking, I currently do not remember why we included the torch package at all, must be some technical stuff but we only build pipelines with scikit-learn algorithms and xgboost - that's it. So, I am sorry to disappoint you @alanwilter , but ML-Plan is no reference for a framework using GPUs.
Also, there are (other) packages that have GPU support (e.g., for xgboost) like autogluon or H2O (I think). Though I don't know how accessible their configurations are from the benchmark framework (forwarding configurations from the framework definitions has some limitations at the moment that also depend on the respective framework integration).
Sorry for late reply (was off for a month): @PGijsbers is right for H2O, it will detect GPUs and use them for xgboost. I don't know how it works with AutoGluon.
AutoGluon will autodetect and use gpu only if hyperparameters='multimodal' for multimodal text tab image data. You can also force GPUs for other models like lightgbm xgboost catboost via referring to website tutorials auto.gluon.ai , but not easy to do with how AMLB currently works