yocto-gl
yocto-gl copied to clipboard
[HELP WANTED][BUG] Can't find Docker for multistep projects
Willingness to contribute
The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?
- [ ] Yes. I can contribute a fix for this bug independently.
- [ ] Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
- [x] No. I cannot contribute a bug fix at this time.
System information
- Have I written custom code (as opposed to using a stock example script provided in MLflow): Yes,
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Arch Linux latest, but using Docker here
- MLflow installed from (source or binary): Docker
-
MLflow version (run
mlflow --version
): latest - Python version: 3.7 something
- Exact command to reproduce: mlflow run .
Describe the problem
Multistep workflow with Docker runs into mlflow.exceptions.ExecutionException: Could not find Docker executable.
.
Docker is clearly installed and should be available, since the run launched successfully and even reused the cached load_raw_data. However, subsequent entrypoints run into the exception.
Code to reproduce issue
https://github.com/Zethson/mlflow_custom_ms_example
This is a very slightly adapted version of the custom multistep example. Please build the Docker container as custom_ms_example
and then simply run the project with the usual mlflow run .
Please be aware, that you may run into subsequent errors such as missing JAVA_HOME or something, since the container may not be complete yet, but at this point it does not get to this stage!
Other info / logs
zeth@master ~/P/custom_multistep [1]> mlflow run . (base)
2020/05/18 16:50:38 INFO mlflow.projects: === Building docker image multistep_example ===
2020/05/18 16:50:54 INFO mlflow.projects: === Created directory /tmp/tmpxwoywhi1 for downloading remote URIs passed to arguments of type 'path' ===
2020/05/18 16:50:54 INFO mlflow.projects: === Running command 'docker run --rm -v /home/zeth/PycharmProjects/custom_multistep/mlruns:/mlflow/tmp/mlruns -v /home/zeth/PycharmProjects/mlflow/examples/multistep_workflow/mlruns/0/d588d7bc4a174c8bb066748faeb88c5e/artifacts:/home/zeth/PycharmProjects/mlflow/examples/multistep_workflow/mlruns/0/d588d7bc4a174c8bb066748faeb88c5e/artifacts -e MLFLOW_RUN_ID=d588d7bc4a174c8bb066748faeb88c5e -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 multistep_example:latest python main.py --als-max-iter 10 --keras-hidden-units 20 --max-row-limit 100000' in run with ID 'd588d7bc4a174c8bb066748faeb88c5e' ===
Run matched, but has a different source version, so skipping (found=142abbbd6dbc3a9879854f8356f2d7e7d3270729, expected=None)
No matching run has been found.
Found existing run for entrypoint=load_raw_data and parameters={}
Launching new run for entrypoint=etl_data and parameters={'ratings_csv': 'file:///home/zeth/PycharmProjects/mlflow/examples/multistep_workflow/mlruns/0/ed8ba88063bc4ac8acd41a6ddf5bf8b7/artifacts/ratings-csv-dir', 'max_row_limit': 100000}
Traceback (most recent call last):
File "/opt/conda/envs/multistep/lib/python3.7/site-packages/mlflow/projects/__init__.py", line 700, in _validate_docker_installation
process.exec_cmd([docker_path, "--help"], throw_on_error=False)
File "/opt/conda/envs/multistep/lib/python3.7/site-packages/mlflow/utils/process.py", line 43, in exec_cmd
cwd=cwd, universal_newlines=True, **kwargs)
File "/opt/conda/envs/multistep/lib/python3.7/subprocess.py", line 800, in __init__
restore_signals, start_new_session)
File "/opt/conda/envs/multistep/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'docker': 'docker'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 105, in <module>
workflow()
File "/opt/conda/envs/multistep/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/envs/multistep/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/opt/conda/envs/multistep/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/envs/multistep/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "main.py", line 86, in workflow
git_commit)
File "main.py", line 67, in _get_or_run
submitted_run = mlflow.run(".", entrypoint, parameters=parameters)
File "/opt/conda/envs/multistep/lib/python3.7/site-packages/mlflow/projects/__init__.py", line 291, in run
synchronous=synchronous, run_id=run_id)
File "/opt/conda/envs/multistep/lib/python3.7/site-packages/mlflow/projects/__init__.py", line 150, in _run
_validate_docker_installation()
File "/opt/conda/envs/multistep/lib/python3.7/site-packages/mlflow/projects/__init__.py", line 702, in _validate_docker_installation
raise ExecutionException("Could not find Docker executable. "
mlflow.exceptions.ExecutionException: Could not find Docker executable. Ensure Docker is installed as per the instructions at https://docs.docker.com/install/overview/.
2020/05/18 16:50:56 ERROR mlflow.cli: === Run (ID 'd588d7bc4a174c8bb066748faeb88c5e') failed ===
What component(s), interfaces, languages, and integrations does this bug affect?
Components
- [ ]
area/artifacts
: Artifact stores and artifact logging - [ ]
area/build
: Build and test infrastructure for MLflow - [ ]
area/docs
: MLflow documentation pages - [x]
area/examples
: Example code - [ ]
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry - [ ]
area/models
: MLmodel format, model serialization/deserialization, flavors - [x]
area/projects
: MLproject format, project running backends - [ ]
area/scoring
: Local serving, model deployment tools, spark UDFs - [ ]
area/tracking
: Tracking Service, tracking client APIs, autologging
Interface
- [ ]
area/uiux
: Front-end, user experience, JavaScript, plotting - [x]
area/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Models - [ ]
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry - [ ]
area/windows
: Windows support
Language
- [ ]
language/r
: R APIs and clients - [ ]
language/java
: Java APIs and clients
Integrations
- [ ]
integrations/azure
: Azure and Azure ML integrations - [ ]
integrations/sagemaker
: SageMaker integrations
I would like to add that the container works in a non multistep setting.
@Zethson thanks for filing this, just to confirm - it looks like the "docker not found" exception is raised from within your docker container (i.e. within multistep_example:latest
). If you run & log into the docker container via docker exec -it multistep_example:latest bash
, is the docker
executable present in the resulting container?
My suspicion is that the problem is that docker
is not installed within multistep_example:latest
(in general, invoking docker
commands from within a docker
container is a bit tricky, so happy to brainstorm on how to make this easier if that suspicion turns out to be correct)
Dear @smurching,
thank you for your swift response. Docker is not installed inside the Docker container and it's absolutely not supposed to be. My expectation is that every step of the multistep workflow is executed inside the (single) Docker container. I am not using some weird custom code, but am solely running your mlflow multistep example with a single Docker container.
Do you have a multistep mlflow example, which runs via a Docker container?
(I am well aware that the current multistep solution is temporary and will be replaced at the end of the year with a reasonable DAG solution, but for now I would like to get this to work well at least).
@Zethson I see two potential solutions to the problem:
-
Introduce a --no-docker option, which will allow for running entry points without trying to create a new docker container. You could use this option to run individual steps in your multistep workflow without trying to create a nested docker container
-
Attempt to mount the host's docker socket when running docker containers for MLflow project execution as described in the StackOverflow post.
I think both of these would unblock your use case, but require code changes to MLflow. It might be possible to achieve 2) without code changes, I'll investigate. In general, I think I prefer solution 2, as running multistep docker projects would "just work", but it'd require some investigation (i.e. is it always possible to mount the docker socket / identify where it is on the host machine in a platform-independent way)?
Thanks!
I would also like to suggest solution 2, since it would play far more nicely with proposal https://github.com/mlflow/mlflow/issues/2850 .
Hi folks, I've added the help wanted
label to this issue. It would be great to put together a PR that leverages Docker's -v
flag to create sibling containers for multi-step Docker project workflows.
I should say that I am running into a similar issue, and a docker run -v
option would be great
Hi all, I have created a docker multistep project example based on the multistep_workflow
one. It would be great if you could replicate it and validate my approach before doing a PR for that. You can find my example here.
In the example, volumes
are set to execute docker within the container and to have the artifacts available for every new container created. You can find instructions to replicate the example in the README.
@dbczumar I can create a PR if someone from the community could check the example I created ☝🏽 which is functional and provide me some guidance. There, I am using volumes set in the MLproject
file to execute docker inside the docker and share the mlruns folder.
Hi all, I have created a docker multistep project example based on the
multistep_workflow
one. It would be great if you could replicate it and validate my approach before doing a PR for that. You can find my example here.In the example,
volumes
are set to execute docker within the container and to have the artifacts available for every new container created. You can find instructions to replicate the example in the README.
Thank you @symeneses for finding this workaround. I needed to add /usr/bin/docker:/usr/bin/docker
volume mount to get it working.