yocto-gl
yocto-gl copied to clipboard
[BUG] Running a MLflow project with docker_env fails to create the docker container.
System information
- Have I written custom code (as opposed to using a stock example script provided in MLflow): https://github.com/mlflow/mlflow/tree/master/examples/docker
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
- MLflow installed from (source or binary): pip install mlflow
-
MLflow version (run
mlflow --version
): 1.6 - Python version: 3.7.6
- npm version, if running the dev UI:
- Exact command to reproduce: mlflow run examples/docker -P alpha=0.5
Describe the problem
The example MLflow project (and my own aswell) using a docker_env and run with above command throws a docker error.
Expected behavior: Python file is executed and tracked and run is added in mlruns.
Actual behavior: docker throws an error
docker: Error response from daemon: invalid mode: \git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts.
The problem seems to be that mlflow tries passes a -v flag to docker to map a host directory to itself: docker run --rm -v D:\git_repos\mlflow_example\mlruns:/mlflow/tmp/mlruns -v D:\git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts:D:\git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts -e MLFLOW_RUN_ID=e6763b1645214c54bb5d606e3be72170 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 docker-example:93e3a50 python train.py --alpha 0.5 --l1-ratio 0.1' in run with ID 'e6763b1645214c54bb5d606e3be72170
Code to reproduce issue
Simply follow the instrucitons in [https://github.com/mlflow/mlflow/tree/master/examples/docker]
Other info / logs
(CGa_env) D:\git_repos\mlflow_example>mlflow run examples/docker -P alpha=0.5 2020/02/27 16:31:49 INFO mlflow.projects: === Building docker image docker-example:93e3a50 === 2020/02/27 16:31:49 INFO mlflow.projects: Temporary docker context file C:\Users\CC073~1.GAI\AppData\Local\Temp\tmpfp1uz6ee was not deleted. 2020/02/27 16:31:49 INFO mlflow.projects: === Created directory C:\Users\CC073~1.GAI\AppData\Local\Temp\tmp88nz0lmt for downloading remote URIs passed to arguments of type 'path' === 2020/02/27 16:31:49 INFO mlflow.projects: === Running command 'docker run --rm -v D:\git_repos\mlflow_example\mlruns:/mlflow/tmp/mlruns -v D:\git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts:D:\git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts -e MLFLOW_RUN_ID=e6763b1645214c54bb5d606e3be72170 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 docker-example:93e3a50 python train.py --alpha 0.5 --l1-ratio 0.1' in run with ID 'e6763b1645214c54bb5d606e3be72170' === docker: Error response from daemon: invalid mode: \git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts. See 'docker run --help'. 2020/02/27 16:31:49 ERROR mlflow.cli: === Run (ID 'e6763b1645214c54bb5d606e3be72170') failed ===
Same issue on Windows. 10 Pro 1903 Basic Example from the MLFlow Documentation does not work.
MLFlow Project with Docker as Environment fails when used with 'mlflow run .'
Exception:
docker: Error response from daemon: invalid mode: \Users\andre\code\mlflow1\productionfirst\mlruns\0\c4132f95210546f787f89591b0e6d00e\artifacts.
MLProject
name: productionfirst
docker_env:
image: mlflow-docker-example
entry_points:
main:
command: "python classifier.py"
Dockerfile
FROM continuumio/miniconda:4.5.4
RUN pip install mlflow>=1.0 \
&& pip install numpy==1.14.3 \
&& pip install scipy \
&& pip install pandas==0.22.0 \
&& pip install scikit-learn==0.19.1 \
&& pip install cloudpickle \
&& pip install Keras \
&& pip install sklearn
Here is the cmd that is being executed:
020/03/03 14:15:32 INFO mlflow.projects: === Running command 'docker run --rm -v C:\Users\andre\code\mlflow1\productionfirst\mlruns:/mlflow/tmp/mlruns -v C:\Users\andre\code\mlflow1\productionfirst\mlruns\0\9fd34ea8e7ed4e289a0d3c1b1b826fd8\artifacts:C:\Users\andre\code\mlflow1\productionfirst\mlruns\0\9fd34ea8e7ed4e289a0d3c1b1b826fd8\artifacts -e MLFLOW_RUN_ID=9fd34ea8e7ed4e289a0d3c1b1b826fd8 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 productionfirst:latest python3 classifier.py' in run with ID '9fd34ea8e7ed4e289a0d3c1b1b826fd8' ===
docker: Error response from daemon: invalid mode: \Users\andre\code\mlflow1\productionfirst\mlruns\0\9fd34ea8e7ed4e289a0d3c1b1b826fd8\artifacts.
docker run mounts two volumes
- C:\Users\andre\code\mlflow1\productionfirst\mlruns
- C:\Users\andre\code\mlflow1\productionfirst\mlruns\0\9fd34ea8e7ed4e289a0d3c1b1b826fd8\artifacts
The second mount is
- logically incorrect because it maps from current systems path to current systems path
C:\user:C:\user
instead ofC:\user\:/home/user/app
- redundant since the files in mlruns are already covered with the first mount.
Removing the second mount and executing the command manually in Command Shell solves the issue. Will try to code a fix and create a PR later.
Problem seems to lie in project/init.py line 654/655 and _get_local_artifact_cmd_and_envs line 804
artifact_cmds, artifact_envs = \
_get_docker_artifact_storage_cmd_and_envs(active_run.info.artifact_uri)
Working on a bugfix
Same issue with fresh install of mlflow today while following the docker example from mlflow github repo.
Still an issue for me as well. https://github.com/mlflow/mlflow/issues/1335#issuecomment-812686947
Same for me, still does not work as illustrated in MLflow documentation:
https://github.com/mlflow/mlflow/tree/master/examples/docker
I am also facing the issue. I am following the Docker example as written in MLflow documentation https://github.com/mlflow/mlflow/tree/master/examples/docker
And getting this error upon running the project: 2021/11/16 13:05:12 INFO mlflow.projects.docker: === Building docker image docker-example:d6ae841 === 2021/11/16 13:05:13 INFO mlflow.projects.docker: Temporary docker context file C:\Users\FARHAN~1\AppData\Local\Temp\tmp95rojz21 was not deleted. 2021/11/16 13:05:13 INFO mlflow.projects.utils: === Created directory C:\Users\FARHAN~1\AppData\Local\Temp\tmpr4ox7wqg for downloading remote URIs passed to arguments of type 'path' === 2021/11/16 13:05:13 INFO mlflow.projects.backend.local: === Running command 'docker run --rm -v \mlruns.db:/mlflow/tmp/mlruns -v C:\Coding\learning\mlops\MLOPs_with_MLFlow\mlflow\mlflow\mlruns\0\8b92b73848a549e08911fddc54d3c5cb\artifacts:\mlflow\projects\code\mlruns\0\8b92b73848a549e08911fddc54d3c5cb\artifacts -e MLFLOW_RUN_ID=8b92b73848a549e08911fddc54d3c5cb -e MLFLOW_TRACKING_URI=sqlite:///C:/mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 docker-example:d6ae841 python train.py --alpha 0.5 --l1-ratio 0.1' in run with ID '8b92b73848a549e08911fddc54d3c5cb' === docker: Error response from daemon: \mlruns.db%!(EXTRA string=is not a valid Windows path). See 'docker run --help'. 2021/11/16 13:05:14 ERROR mlflow.cli: === Run (ID '8b92b73848a549e08911fddc54d3c5cb') failed ===
It's still an issue. I've tried it on both an example from Machine-Learning-Engineering-with-MLflow book & official MLflow-docker-example using mlflow run .
& mlflow run . -P alpha=0.4
respectively (Windows 10, mlflow v1.24.0)
- Book result:
2022/03/06 13:44:12 INFO mlflow.projects.backend.local: === Running command 'docker run --rm -v C:\Users\user_name\Projects\Machine-Learning-Engineering-with-Mlflow\Chapter01\stockpred\mlruns:/mlflow/tmp/mlruns -v C:\Users\user_name\Projects\Machine-Learning-Engineering-with-Mlflow\Chapter01\stockpred\mlruns\0\0fc34b0c7b8149e5b810d7dec0b8b304\artifacts:C:\Users\user_name\Projects\Machine-Learning-Engineering-with-Mlflow\Chapter01\stockpred\mlruns\0\0fc34b0c7b8149e5b810d7dec0b8b304\artifacts -e MLFLOW_RUN_ID=0fc34b0c7b8149e5b810d7dec0b8b304 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 stockpred:c8bf265 python train.py' in run with ID '0fc34b0c7b8149e5b810d7dec0b8b304' === docker: Error response from daemon: invalid mode: \Users\user_name\Projects\Machine-Learning-Engineering-with-Mlflow\Chapter01\stockpred\mlruns\0\0fc34b0c7b8149e5b810d7dec0b8b304\artifacts. See 'docker run --help'. 2022/03/06 13:44:12 ERROR mlflow.cli: === Run (ID '0fc34b0c7b8149e5b810d7dec0b8b304') failed ===
- Official example result:
2022/03/06 13:50:03 INFO mlflow.projects.backend.local: === Running command 'docker run --rm -v C:\Users\user_name\Projects\mlflow-docker-example\mlruns:/mlflow/tmp/mlruns -v C:\Users\user_name\Projects\mlflow-docker-example\mlruns\0\4c3507f15cdb4581bed284cb3e4118ea\artifacts:C:\Users\user_name\Projects\mlflow-docker-example\mlruns\0\4c3507f15cdb4581bed284cb3e4118ea\artifacts -e MLFLOW_RUN_ID=4c3507f15cdb4581bed284cb3e4118ea -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 docker-example:latest python train.py --alpha 0.4 --l1-ratio 0.1' in run with ID '4c3507f15cdb4581bed284cb3e4118ea' === docker: Error response from daemon: invalid mode: \Users\user_name\Projects\mlflow-docker-example\mlruns\0\4c3507f15cdb4581bed284cb3e4118ea\artifacts. See 'docker run --help'. 2022/03/06 13:50:03 ERROR mlflow.cli: === Run (ID '4c3507f15cdb4581bed284cb3e4118ea') failed ===
As AndreyBulezyuk mentioned removing the second mount from docker run
command and executing it manually in Command Shell solves the issue but breaks the desired mlflow workflow.
This issue is now open for over 2! years and is allegedly an easy fix. Is there any intention on solving this issue, or is there a different recommended approach to this?
I'm also facing the same issue, why container_path is an absolute path? this will impact all windows users.
https://github.com/mlflow/mlflow/blob/7c25e4ddd36d209d22488cdce699419430d74205/mlflow/projects/backend/local.py#L325-L332
I'm still facing this issue with mlflow v2.6.0 on Windows 10. Does anyone know a workaround?
same, i have the mlflow version 2.9.2(latest) and i still face the error in windows, any workarounds/solutions? Thanks
@lennartvandeguchte heres what i did, i have windows 11, i installed wsl for ubuntu and now its all good. occasionally it gets a bit buggy but thats alright ig
@lennartvandeguchte heres what i did, i have windows 11, i installed wsl for ubuntu and now its all good. occasionally it gets a bit buggy but thats alright ig
Could you elaborate more please? I'm still facing this issue. Thanks!
@mario-schiappacasse-ug what os are you using? and what trouble are you having?
@JINO-ROHIT
I'm running windows 11. While trying to run mlflow run it gives the following error. docker: Error response from daemon: invalid mode: \Users<user>\project\mlruns\0<run-id>\artifacts. See 'docker run --help'.
In the MLproject i have defined a docker_env.
@mario-schiappacasse-ug hey the bug is that is doesnt work on windows, you basically have two choices -
- use wsl on windows which gives you a linux environment on a windows machine.
- dual boot and run this on ubuntu.
@JINO-ROHIT Thanks! Will try!
I'm currently trying with devcontainer running in debian. But for some reason mlflow is creating a broken volume for the artifacts.