yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

[BUG] Running a MLflow project with docker_env fails to create the docker container.

Open Grisly00 opened this issue 4 years ago • 17 comments

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): https://github.com/mlflow/mlflow/tree/master/examples/docker
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
  • MLflow installed from (source or binary): pip install mlflow
  • MLflow version (run mlflow --version): 1.6
  • Python version: 3.7.6
  • npm version, if running the dev UI:
  • Exact command to reproduce: mlflow run examples/docker -P alpha=0.5

Describe the problem

The example MLflow project (and my own aswell) using a docker_env and run with above command throws a docker error.

Expected behavior: Python file is executed and tracked and run is added in mlruns.

Actual behavior: docker throws an error

docker: Error response from daemon: invalid mode: \git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts.

The problem seems to be that mlflow tries passes a -v flag to docker to map a host directory to itself: docker run --rm -v D:\git_repos\mlflow_example\mlruns:/mlflow/tmp/mlruns -v D:\git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts:D:\git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts -e MLFLOW_RUN_ID=e6763b1645214c54bb5d606e3be72170 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 docker-example:93e3a50 python train.py --alpha 0.5 --l1-ratio 0.1' in run with ID 'e6763b1645214c54bb5d606e3be72170

Code to reproduce issue

Simply follow the instrucitons in [https://github.com/mlflow/mlflow/tree/master/examples/docker]

Other info / logs

(CGa_env) D:\git_repos\mlflow_example>mlflow run examples/docker -P alpha=0.5 2020/02/27 16:31:49 INFO mlflow.projects: === Building docker image docker-example:93e3a50 === 2020/02/27 16:31:49 INFO mlflow.projects: Temporary docker context file C:\Users\CC073~1.GAI\AppData\Local\Temp\tmpfp1uz6ee was not deleted. 2020/02/27 16:31:49 INFO mlflow.projects: === Created directory C:\Users\CC073~1.GAI\AppData\Local\Temp\tmp88nz0lmt for downloading remote URIs passed to arguments of type 'path' === 2020/02/27 16:31:49 INFO mlflow.projects: === Running command 'docker run --rm -v D:\git_repos\mlflow_example\mlruns:/mlflow/tmp/mlruns -v D:\git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts:D:\git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts -e MLFLOW_RUN_ID=e6763b1645214c54bb5d606e3be72170 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 docker-example:93e3a50 python train.py --alpha 0.5 --l1-ratio 0.1' in run with ID 'e6763b1645214c54bb5d606e3be72170' === docker: Error response from daemon: invalid mode: \git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts. See 'docker run --help'. 2020/02/27 16:31:49 ERROR mlflow.cli: === Run (ID 'e6763b1645214c54bb5d606e3be72170') failed ===

Grisly00 avatar Feb 27 '20 15:02 Grisly00

Same issue on Windows. 10 Pro 1903 Basic Example from the MLFlow Documentation does not work.

MLFlow Project with Docker as Environment fails when used with 'mlflow run .'

Exception: docker: Error response from daemon: invalid mode: \Users\andre\code\mlflow1\productionfirst\mlruns\0\c4132f95210546f787f89591b0e6d00e\artifacts.

MLProject

name: productionfirst

docker_env:
    image:  mlflow-docker-example

entry_points:
  main:
    command: "python classifier.py"

Dockerfile

FROM continuumio/miniconda:4.5.4

RUN pip install mlflow>=1.0 \
    && pip install numpy==1.14.3 \
    && pip install scipy \
    && pip install pandas==0.22.0 \
    && pip install scikit-learn==0.19.1 \
    && pip install cloudpickle \
    && pip install Keras \
    && pip install sklearn

Here is the cmd that is being executed:

020/03/03 14:15:32 INFO mlflow.projects: === Running command 'docker run --rm -v C:\Users\andre\code\mlflow1\productionfirst\mlruns:/mlflow/tmp/mlruns -v C:\Users\andre\code\mlflow1\productionfirst\mlruns\0\9fd34ea8e7ed4e289a0d3c1b1b826fd8\artifacts:C:\Users\andre\code\mlflow1\productionfirst\mlruns\0\9fd34ea8e7ed4e289a0d3c1b1b826fd8\artifacts -e MLFLOW_RUN_ID=9fd34ea8e7ed4e289a0d3c1b1b826fd8 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 productionfirst:latest python3 classifier.py' in run with ID '9fd34ea8e7ed4e289a0d3c1b1b826fd8' ===
docker: Error response from daemon: invalid mode: \Users\andre\code\mlflow1\productionfirst\mlruns\0\9fd34ea8e7ed4e289a0d3c1b1b826fd8\artifacts.

docker run mounts two volumes

  1. C:\Users\andre\code\mlflow1\productionfirst\mlruns
  2. C:\Users\andre\code\mlflow1\productionfirst\mlruns\0\9fd34ea8e7ed4e289a0d3c1b1b826fd8\artifacts

The second mount is

  • logically incorrect because it maps from current systems path to current systems path C:\user:C:\user instead of C:\user\:/home/user/app
  • redundant since the files in mlruns are already covered with the first mount.

Removing the second mount and executing the command manually in Command Shell solves the issue. Will try to code a fix and create a PR later.

Problem seems to lie in project/init.py line 654/655 and _get_local_artifact_cmd_and_envs line 804

artifact_cmds, artifact_envs = \
        _get_docker_artifact_storage_cmd_and_envs(active_run.info.artifact_uri)

AndreyBulezyuk avatar Mar 05 '20 07:03 AndreyBulezyuk

Working on a bugfix

AndreyBulezyuk avatar Mar 05 '20 07:03 AndreyBulezyuk

Same issue with fresh install of mlflow today while following the docker example from mlflow github repo.

daqieq avatar Jan 12 '21 00:01 daqieq

Still an issue for me as well. https://github.com/mlflow/mlflow/issues/1335#issuecomment-812686947

jwa5426 avatar Apr 02 '21 20:04 jwa5426

Same for me, still does not work as illustrated in MLflow documentation:

https://github.com/mlflow/mlflow/tree/master/examples/docker

aymutlu avatar Aug 24 '21 08:08 aymutlu

I am also facing the issue. I am following the Docker example as written in MLflow documentation https://github.com/mlflow/mlflow/tree/master/examples/docker

And getting this error upon running the project: 2021/11/16 13:05:12 INFO mlflow.projects.docker: === Building docker image docker-example:d6ae841 === 2021/11/16 13:05:13 INFO mlflow.projects.docker: Temporary docker context file C:\Users\FARHAN~1\AppData\Local\Temp\tmp95rojz21 was not deleted. 2021/11/16 13:05:13 INFO mlflow.projects.utils: === Created directory C:\Users\FARHAN~1\AppData\Local\Temp\tmpr4ox7wqg for downloading remote URIs passed to arguments of type 'path' === 2021/11/16 13:05:13 INFO mlflow.projects.backend.local: === Running command 'docker run --rm -v \mlruns.db:/mlflow/tmp/mlruns -v C:\Coding\learning\mlops\MLOPs_with_MLFlow\mlflow\mlflow\mlruns\0\8b92b73848a549e08911fddc54d3c5cb\artifacts:\mlflow\projects\code\mlruns\0\8b92b73848a549e08911fddc54d3c5cb\artifacts -e MLFLOW_RUN_ID=8b92b73848a549e08911fddc54d3c5cb -e MLFLOW_TRACKING_URI=sqlite:///C:/mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 docker-example:d6ae841 python train.py --alpha 0.5 --l1-ratio 0.1' in run with ID '8b92b73848a549e08911fddc54d3c5cb' === docker: Error response from daemon: \mlruns.db%!(EXTRA string=is not a valid Windows path). See 'docker run --help'. 2021/11/16 13:05:14 ERROR mlflow.cli: === Run (ID '8b92b73848a549e08911fddc54d3c5cb') failed ===

FarhanAhmad4473 avatar Nov 16 '21 08:11 FarhanAhmad4473

It's still an issue. I've tried it on both an example from Machine-Learning-Engineering-with-MLflow book & official MLflow-docker-example using mlflow run . & mlflow run . -P alpha=0.4 respectively (Windows 10, mlflow v1.24.0)

  • Book result:

2022/03/06 13:44:12 INFO mlflow.projects.backend.local: === Running command 'docker run --rm -v C:\Users\user_name\Projects\Machine-Learning-Engineering-with-Mlflow\Chapter01\stockpred\mlruns:/mlflow/tmp/mlruns -v C:\Users\user_name\Projects\Machine-Learning-Engineering-with-Mlflow\Chapter01\stockpred\mlruns\0\0fc34b0c7b8149e5b810d7dec0b8b304\artifacts:C:\Users\user_name\Projects\Machine-Learning-Engineering-with-Mlflow\Chapter01\stockpred\mlruns\0\0fc34b0c7b8149e5b810d7dec0b8b304\artifacts -e MLFLOW_RUN_ID=0fc34b0c7b8149e5b810d7dec0b8b304 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 stockpred:c8bf265 python train.py' in run with ID '0fc34b0c7b8149e5b810d7dec0b8b304' === docker: Error response from daemon: invalid mode: \Users\user_name\Projects\Machine-Learning-Engineering-with-Mlflow\Chapter01\stockpred\mlruns\0\0fc34b0c7b8149e5b810d7dec0b8b304\artifacts. See 'docker run --help'. 2022/03/06 13:44:12 ERROR mlflow.cli: === Run (ID '0fc34b0c7b8149e5b810d7dec0b8b304') failed ===

  • Official example result:

2022/03/06 13:50:03 INFO mlflow.projects.backend.local: === Running command 'docker run --rm -v C:\Users\user_name\Projects\mlflow-docker-example\mlruns:/mlflow/tmp/mlruns -v C:\Users\user_name\Projects\mlflow-docker-example\mlruns\0\4c3507f15cdb4581bed284cb3e4118ea\artifacts:C:\Users\user_name\Projects\mlflow-docker-example\mlruns\0\4c3507f15cdb4581bed284cb3e4118ea\artifacts -e MLFLOW_RUN_ID=4c3507f15cdb4581bed284cb3e4118ea -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 docker-example:latest python train.py --alpha 0.4 --l1-ratio 0.1' in run with ID '4c3507f15cdb4581bed284cb3e4118ea' === docker: Error response from daemon: invalid mode: \Users\user_name\Projects\mlflow-docker-example\mlruns\0\4c3507f15cdb4581bed284cb3e4118ea\artifacts. See 'docker run --help'. 2022/03/06 13:50:03 ERROR mlflow.cli: === Run (ID '4c3507f15cdb4581bed284cb3e4118ea') failed ===

As AndreyBulezyuk mentioned removing the second mount from docker run command and executing it manually in Command Shell solves the issue but breaks the desired mlflow workflow.

MichalNawrot avatar Mar 06 '22 15:03 MichalNawrot

This issue is now open for over 2! years and is allegedly an easy fix. Is there any intention on solving this issue, or is there a different recommended approach to this?

Grisly00 avatar Mar 07 '22 08:03 Grisly00

I'm also facing the same issue, why container_path is an absolute path? this will impact all windows users.

https://github.com/mlflow/mlflow/blob/7c25e4ddd36d209d22488cdce699419430d74205/mlflow/projects/backend/local.py#L325-L332

karthickme avatar Nov 24 '22 07:11 karthickme

I'm still facing this issue with mlflow v2.6.0 on Windows 10. Does anyone know a workaround?

lennartvandeguchte avatar Sep 06 '23 09:09 lennartvandeguchte

same, i have the mlflow version 2.9.2(latest) and i still face the error in windows, any workarounds/solutions? Thanks

JINO-ROHIT avatar Jan 10 '24 09:01 JINO-ROHIT

@lennartvandeguchte heres what i did, i have windows 11, i installed wsl for ubuntu and now its all good. occasionally it gets a bit buggy but thats alright ig

JINO-ROHIT avatar Jan 23 '24 15:01 JINO-ROHIT

@lennartvandeguchte heres what i did, i have windows 11, i installed wsl for ubuntu and now its all good. occasionally it gets a bit buggy but thats alright ig

Could you elaborate more please? I'm still facing this issue. Thanks!

mario-schiappacasse-ug avatar Apr 15 '24 16:04 mario-schiappacasse-ug

@mario-schiappacasse-ug what os are you using? and what trouble are you having?

JINO-ROHIT avatar Apr 15 '24 16:04 JINO-ROHIT

@JINO-ROHIT

I'm running windows 11. While trying to run mlflow run it gives the following error. docker: Error response from daemon: invalid mode: \Users<user>\project\mlruns\0<run-id>\artifacts. See 'docker run --help'.

In the MLproject i have defined a docker_env.

mario-schiappacasse-ug avatar Apr 15 '24 16:04 mario-schiappacasse-ug

@mario-schiappacasse-ug hey the bug is that is doesnt work on windows, you basically have two choices -

  1. use wsl on windows which gives you a linux environment on a windows machine.
  2. dual boot and run this on ubuntu.

JINO-ROHIT avatar Apr 15 '24 17:04 JINO-ROHIT

@JINO-ROHIT Thanks! Will try!

I'm currently trying with devcontainer running in debian. But for some reason mlflow is creating a broken volume for the artifacts.

mario-schiappacasse-ug avatar Apr 15 '24 18:04 mario-schiappacasse-ug