azure-sdk-for-python icon indicating copy to clipboard operation
azure-sdk-for-python copied to clipboard

AzureML wokspace environments build failing SDKV1

Open obiii opened this issue 1 year ago • 10 comments

  • Package Name: azureml-sdk
  • Package Version: V1
  • Operating System: Linux
  • Python Version:3.10

The bug We have been building ML environments using docker files in Azure workspace using a CI pipeline. It used to work fine until today we tried to rebuild the environment with new conda dependencies.

The build job fails with a weird error, which seems to be an internal bug. Here is the screenshot of the job:

image

On examining logs, we see this:

image

image

The script above "script.py" is Azures's internal script. I believe it's missing an exists_ok=True in mkdir(name, mode). That's why it complains that the file is already present?

We use az ml environment create --name "$azureEnvPrefix" --build-context $dockerContext --dockerfile-path Dockerfile --resource-group ${{parameters.resourceGroupName}} --workspace-name ${{parameters.workspaceName}} --tags "dev=$timestamp.CI" "ready_for=${{parameters.targetTag}}"

in a CI task to create envs.

obiii avatar Feb 22 '24 10:02 obiii

Hi @obiii - Thanks for opening an issue. We'll take a look asap. cc/ @azureml-github

swathipil avatar Feb 23 '24 15:02 swathipil

Hi @obiii , Could you please provide more details and reproduce steps to investigate the root cause.

isaudagar avatar Feb 29 '24 12:02 isaudagar

Hi @obiii , Could you please provide more details and reproduce steps to investigate the root cause.

Hi, I assume you mean the docker files. Please let me know otherwise. The docker files are generated by a CI pipeline that I cannot share but here are the resultant files that the CI pipeline uses to build images,

Dockerfile

ARG CONDA_FILE
ARG IMAGE_NAME
FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04

RUN echo "CONDA_FILE: conda_dependencies/preprocess_conda_dependencies.yml"
RUN echo "IMAGE_NAME: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04"

COPY conda_dependencies/preprocess_conda_dependencies.yml conda_env.yml

RUN rm /bin/sh && ln -s /bin/bash /bin/sh
RUN echo "source /opt/miniconda/etc/profile.d/conda.sh && conda activate" >> ~/.bashrc

RUN cat conda_env.yml

RUN source /opt/miniconda/etc/profile.d/conda.sh && \
    conda activate && \
    conda install conda && \
    pip install cmake && \
    conda env update -f conda_env.yml

conda_dependencies/preprocess_conda_dependencies.yml

channels:
  - anaconda
  - conda-forge
  - defaults
dependencies:
  - python=3.11
  - pip:
      - numpy==1.23.5
      - pandas==2.1.4
      - prophet==1.1.5
      - SQLAlchemy==2.0.23
      - urllib3==2.1.0
      - pyodbc==5.0.1
      - mlflow==2.9.1
      - azureml-mlflow==1.54.0.post1
      - azureml-core==1.54.0.post1
      - azureml-dataset-runtime==1.54.0.post1

The command inside the CI ppl to build the images: az ml environment create --name "$azureEnvPrefix" --build-context $dockerContext --dockerfile-path Dockerfile --resource-group ${{parameters.resourceGroupName}} --workspace-name ${{parameters.workspaceName}}

You can provide values for the argument according to your setup. For us the file structure is as follows:

projectName/
    ml_service/
        docker/
            Dockerfile
            conda_dependencies/
                preprocess_conda_dependencies.yml

dockerBaseDir="ml_service/docker" dockerContext="$(System.DefaultWorkingDirectory)/$dockerBaseDir"

obiii avatar Feb 29 '24 14:02 obiii

Same problem here,

Any ideas?

edgBR avatar Mar 04 '24 08:03 edgBR

Hi @isaudagar , is there any update on this please?

obiii avatar Mar 04 '24 09:03 obiii

Hello @obiii, I have trying to run the above files and getting some errors. Can you please provide the conda.sh file details?

Junnu-akhila avatar Mar 04 '24 12:03 Junnu-akhila

Hi @Junnu-akhila , Here is the conda.sh file.

export CONDA_EXE='/opt/miniconda/bin/conda'
export _CE_M=''
export _CE_CONDA=''
export CONDA_PYTHON_EXE='/opt/miniconda/bin/python'

# Copyright (C) 2012 Anaconda, Inc
# SPDX-License-Identifier: BSD-3-Clause
__conda_exe() (
    "$CONDA_EXE" $_CE_M $_CE_CONDA "$@"
)

__conda_hashr() {
    if [ -n "${ZSH_VERSION:+x}" ]; then
        \rehash
    elif [ -n "${POSH_VERSION:+x}" ]; then
        :  # pass
    else
        \hash -r
    fi
}

__conda_activate() {
    if [ -n "${CONDA_PS1_BACKUP:+x}" ]; then
        # Handle transition from shell activated with conda <= 4.3 to a subsequent activation
        # after conda updated to >= 4.4. See issue #6173.
        PS1="$CONDA_PS1_BACKUP"
        \unset CONDA_PS1_BACKUP
    fi
    \local ask_conda
    ask_conda="$(PS1="${PS1:-}" __conda_exe shell.posix "$@")" || \return
    \eval "$ask_conda"
    __conda_hashr
}

__conda_reactivate() {
    \local ask_conda
    ask_conda="$(PS1="${PS1:-}" __conda_exe shell.posix reactivate)" || \return
    \eval "$ask_conda"
    __conda_hashr
}

conda() {
    \local cmd="${1-__missing__}"
    case "$cmd" in
        activate|deactivate)
            __conda_activate "$@"
            ;;
        install|update|upgrade|remove|uninstall)
            __conda_exe "$@" || \return
            __conda_reactivate
            ;;
        *)
            __conda_exe "$@"
            ;;
    esac
}

if [ -z "${CONDA_SHLVL+x}" ]; then
    \export CONDA_SHLVL=0
    # In dev-mode CONDA_EXE is python.exe and on Windows
    # it is in a different relative location to condabin.
    if [ -n "${_CE_CONDA:+x}" ] && [ -n "${WINDIR+x}" ]; then
        PATH="$(\dirname "$CONDA_EXE")/condabin${PATH:+":${PATH}"}"
    else
        PATH="$(\dirname "$(\dirname "$CONDA_EXE")")/condabin${PATH:+":${PATH}"}"
    fi
    \export PATH

    # We're not allowing PS1 to be unbound. It must at least be set.
    # However, we're not exporting it, which can cause problems when starting a second shell
    # via a first shell (i.e. starting zsh from bash).
    if [ -z "${PS1+x}" ]; then
        PS1=
    fi
fi
```export CONDA_EXE='/opt/miniconda/bin/conda'
export _CE_M=''
export _CE_CONDA=''
export CONDA_PYTHON_EXE='/opt/miniconda/bin/python'

# Copyright (C) 2012 Anaconda, Inc
# SPDX-License-Identifier: BSD-3-Clause
__conda_exe() (
    "$CONDA_EXE" $_CE_M $_CE_CONDA "$@"
)

__conda_hashr() {
    if [ -n "${ZSH_VERSION:+x}" ]; then
        \rehash
    elif [ -n "${POSH_VERSION:+x}" ]; then
        :  # pass
    else
        \hash -r
    fi
}

__conda_activate() {
    if [ -n "${CONDA_PS1_BACKUP:+x}" ]; then
        # Handle transition from shell activated with conda <= 4.3 to a subsequent activation
        # after conda updated to >= 4.4. See issue #6173.
        PS1="$CONDA_PS1_BACKUP"
        \unset CONDA_PS1_BACKUP
    fi
    \local ask_conda
    ask_conda="$(PS1="${PS1:-}" __conda_exe shell.posix "$@")" || \return
    \eval "$ask_conda"
    __conda_hashr
}

__conda_reactivate() {
    \local ask_conda
    ask_conda="$(PS1="${PS1:-}" __conda_exe shell.posix reactivate)" || \return
    \eval "$ask_conda"
    __conda_hashr
}

conda() {
    \local cmd="${1-__missing__}"
    case "$cmd" in
        activate|deactivate)
            __conda_activate "$@"
            ;;
        install|update|upgrade|remove|uninstall)
            __conda_exe "$@" || \return
            __conda_reactivate
            ;;
        *)
            __conda_exe "$@"
            ;;
    esac
}

if [ -z "${CONDA_SHLVL+x}" ]; then
    \export CONDA_SHLVL=0
    # In dev-mode CONDA_EXE is python.exe and on Windows
    # it is in a different relative location to condabin.
    if [ -n "${_CE_CONDA:+x}" ] && [ -n "${WINDIR+x}" ]; then
        PATH="$(\dirname "$CONDA_EXE")/condabin${PATH:+":${PATH}"}"
    else
        PATH="$(\dirname "$(\dirname "$CONDA_EXE")")/condabin${PATH:+":${PATH}"}"
    fi
    \export PATH

    # We're not allowing PS1 to be unbound. It must at least be set.
    # However, we're not exporting it, which can cause problems when starting a second shell
    # via a first shell (i.e. starting zsh from bash).
    if [ -z "${PS1+x}" ]; then
        PS1=
    fi
fi

obiii avatar Mar 05 '24 15:03 obiii

Hi @isaudagar , is there any update on this please?

Hi, in case you need the image build logs: build_log.txt

Please let me know if there is any updated. Thanks :)

obiii avatar Mar 06 '24 09:03 obiii

Hi @obiii

The FileExistsError typically occurs in Python when you attempt to create a file or directory that already exists. However, looking at the command you provided, it seems that the error might not be directly related to file creation.

In your Dockerfile, you're using conda env update -f conda_env.yml command to update a Conda environment using a YAML file (conda_env.yml). This error might occur if one of the packages specified in conda_env.yml is already installed in the environment or if the environment itself already exists.

Here are a few things to check and troubleshoot: Check Environment Existence: Ensure that the Conda environment specified in the conda_env.yml file exists before attempting to update it. You can use conda env list to see the list of existing environments. Check Package Versions: If a package specified in conda_env.yml is already installed but with a different version, Conda might raise an error. Make sure the versions specified in the YAML file are compatible with the current environment. Clean Environment: If you're okay with removing the existing environment and recreating it from scratch, you can use conda env remove -n <environment_name> to remove the existing environment before running the update command. Check File Paths: Ensure that the conda_env.yml file is located in the correct directory and that the path is correctly specified in the RUN command. Permissions: Ensure that the user running the Dockerfile has the necessary permissions to create and modify Conda environments and install packages.

Junnu-akhila avatar Mar 06 '24 15:03 Junnu-akhila

Hi @obiii

The FileExistsError typically occurs in Python when you attempt to create a file or directory that already exists. However, looking at the command you provided, it seems that the error might not be directly related to file creation.

In your Dockerfile, you're using conda env update -f conda_env.yml command to update a Conda environment using a YAML file (conda_env.yml). This error might occur if one of the packages specified in conda_env.yml is already installed in the environment or if the environment itself already exists.

Here are a few things to check and troubleshoot: Check Environment Existence: Ensure that the Conda environment specified in the conda_env.yml file exists before attempting to update it. You can use conda env list to see the list of existing environments. Check Package Versions: If a package specified in conda_env.yml is already installed but with a different version, Conda might raise an error. Make sure the versions specified in the YAML file are compatible with the current environment. Clean Environment: If you're okay with removing the existing environment and recreating it from scratch, you can use conda env remove -n <environment_name> to remove the existing environment before running the update command. Check File Paths: Ensure that the conda_env.yml file is located in the correct directory and that the path is correctly specified in the RUN command. Permissions: Ensure that the user running the Dockerfile has the necessary permissions to create and modify Conda environments and install packages.

Hi,

But its not our dockerfile that problematic. Its the azure's internal. Please look at the screenshot, it says: "/azureml-envs/image-build/lib/python3.8/os.py"

The image when build locally works: I have build the docker image using the same docker file; it builds and runs fine.

I have tried changing the dependencies, even changing the base image to docker/python:3.11 , and trimming them to just one or two. I have tried changing base images but nothing works. And this docker setup is being used in our other projects which I fear will crash now if we ever try to rebuilt the environments. This environment that we are talking about was working fine a month ago and I just added a single pycountry dependency.

obiii avatar Mar 08 '24 10:03 obiii

Hi @obiii,

Previously i faced some issues with /opt/miniconda/profile.d/conda.sh. Now it was resolved. It is in progress and once i find anything i will update you. image

Junnu-akhila avatar Mar 14 '24 19:03 Junnu-akhila

Hi @obiii,

I have run the Docker file successfully and while creating the ML environment, I'm getting below error. It is in progress.

image

Junnu-akhila avatar Mar 19 '24 18:03 Junnu-akhila

Hi @obiii,

I have run the Docker file successfully and while creating the ML environment, I'm getting below error. It is in progress.

image

Hi @Junnu-akhila, just to update you, the same works if the environment is built in the registry instead of the workspace, not sure why tho!

obiii avatar Mar 19 '24 20:03 obiii

Hi @obiii

May I know which python version you are using? Docker file run successfully, After updating python version 3.11 to 3.10 in preprocess_conda_dependencies.yml file. Could you please try with python3.10 version.

Thank you.

Junnu-akhila avatar Mar 21 '24 10:03 Junnu-akhila

Hi @obiii ,

create my_env.yml file by using below code:

$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json name: newdockerenv build: path: ml_service/docker

please run below command if you get any ml extension issue, Command: az extension add --name azure-cli-ml

you can use below command for ml environment creating in workspace: az ml environment create --file my_env.yml --resource-group <your RG name> --workspace-name

image

If you will get any issue, please let me Thank you.

Junnu-akhila avatar Mar 21 '24 14:03 Junnu-akhila

Hi @obiii, Could you please confirm, is this issue resolved or not?

Junnu-akhila avatar Mar 25 '24 12:03 Junnu-akhila

az ml environment create --file my_env.yml --resource-group --workspace-name

I tried your method above: It doesn't work. This is the trace:

The yaml file you provided does not match the prescribed schema for Environment yaml files and/or has the following issues:

Error:

  1. A least one unrecognized parameter is specified

Details: Validation for EnvironmentSchema failed

(x) build:

  • Field may not be null.

(x) path:

  • Unknown field.

Resolutions:

  1. Remove any parameters not prescribed by the Environment schema. Visit this link to refer to the Environment schema if needed: https://aka.ms/ml-cli-v2-environment-yaml-reference. If using the CLI, you can also check the full log in debug mode for more details by adding --debug to the end of your command

Also, even if this works, we do not have my_env.yaml files for all the environments. We have docker.template file that gets filled according to which environment is being built and it produces a Dockerfile that is used with az ml command as explained above (referenced).

A Dockerfile produced the template, for a specific preprocess environment is as follows:

# Start with a base image, for example:
# FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04

# Use the provided environment variables for conda and environment file paths
ARG CONDA_FILE
ARG IMAGE_NAME
FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04

RUN echo "CONDA_FILE: conda_dependencies/preprocess_conda_dependencies.yml"
RUN echo "IMAGE_NAME: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04"

COPY conda_dependencies/preprocess_conda_dependencies.yml conda_env.yml

RUN rm /bin/sh && ln -s /bin/bash /bin/sh
RUN echo "source /opt/miniconda/etc/profile.d/conda.sh && conda activate" >> ~/.bashrc

RUN cat conda_env.yml

RUN source /opt/miniconda/etc/profile.d/conda.sh && \
    conda activate && \
    conda install conda && \
    pip install cmake && \
    conda env update -f conda_env.yml --prune

We want to use such Dockerfiles to build the environments: az ml environment create --name "$azureEnvPrefix" --build-context $dockerContext --dockerfile-path Dockerfile --resource-group <someName> --workspace-name <someName>

obiii avatar Mar 25 '24 14:03 obiii

Hi @obiii

May I know which python version you are using? Docker file run successfully, After updating python version 3.11 to 3.10 in preprocess_conda_dependencies.yml file. Could you please try with python3.10 version.

Thank you.

Hi, tried this, doesn't work. Results in same error. image

obiii avatar Mar 25 '24 14:03 obiii

az ml environment create --file my_env.yml --resource-group --workspace-name

I tried your method above: It doesn't work. This is the trace:

The yaml file you provided does not match the prescribed schema for Environment yaml files and/or has the following issues: Error:

  1. A least one unrecognized parameter is specified

Details: Validation for EnvironmentSchema failed (x) build:

  • Field may not be null.

(x) path:

  • Unknown field.

Resolutions:

  1. Remove any parameters not prescribed by the Environment schema. Visit this link to refer to the Environment schema if needed: https://aka.ms/ml-cli-v2-environment-yaml-reference. If using the CLI, you can also check the full log in debug mode for more details by adding --debug to the end of your command

Also, even if this works, we do not have my_env.yaml files for all the environments. We have docker.template file that gets filled according to which environment is being built and it produces a Dockerfile that is used with az ml command as explained above (referenced).

A Dockerfile produced the template, for a specific preprocess environment is as follows:

# Start with a base image, for example:
# FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04

# Use the provided environment variables for conda and environment file paths
ARG CONDA_FILE
ARG IMAGE_NAME
FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04

RUN echo "CONDA_FILE: conda_dependencies/preprocess_conda_dependencies.yml"
RUN echo "IMAGE_NAME: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04"

COPY conda_dependencies/preprocess_conda_dependencies.yml conda_env.yml

RUN rm /bin/sh && ln -s /bin/bash /bin/sh
RUN echo "source /opt/miniconda/etc/profile.d/conda.sh && conda activate" >> ~/.bashrc

RUN cat conda_env.yml

RUN source /opt/miniconda/etc/profile.d/conda.sh && \
    conda activate && \
    conda install conda && \
    pip install cmake && \
    conda env update -f conda_env.yml --prune

We want to use such Dockerfiles to build the environments: az ml environment create --name "$azureEnvPrefix" --build-context $dockerContext --dockerfile-path Dockerfile --resource-group --workspace-name

Hi @obiii ,

you need add some space after build in my_env.yml file:

$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json name: newdockerenv build: path: ml_service/docker

By using below command, i have successfully created ML environment.

az ml environment create --file my_env.yml --resource-group --workspace-name image

I have able to create the environment by using your command,

az ml environment create --name "$azureEnvPrefix" --build-context $dockerContext --dockerfile-path Dockerfile --resource-group --workspace-name

image

image

image

By using python 3.11 version, we facing process 'python' exited with status code 1. image

By using python3.10 , Job is succeeded, image

Could you please try with above process?

Thank you.

Junnu-akhila avatar Mar 27 '24 13:03 Junnu-akhila

Hi @obiii Can you use the below screenshot by creating the my_env.yml. You need to provide the four spaces in front of path in build section and we tried two ways and environments created. image

If you will get any issues after using this approach we will work on it.

shekshavalicentific avatar Mar 27 '24 13:03 shekshavalicentific

Hi @obiii Could you please confirm, is this issue resolved or not?

Junnu-akhila avatar Apr 02 '24 05:04 Junnu-akhila

Hi @obiii Could you please confirm, is this issue resolved or not?

Junnu-akhila avatar Apr 03 '24 14:04 Junnu-akhila

Hi @obiii. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions[bot] avatar Apr 03 '24 14:04 github-actions[bot]

Hi @Junnu-akhila

Sorry for late response. I tried it and it creates the environment but it doesn't build:

image

On checking logs I see:

image image image

obiii avatar Apr 04 '24 10:04 obiii

Hi @obiii, As per your errors, you need to provide Docker file path in Docker context. After Could you try again? Thank you.

Junnu-akhila avatar Apr 04 '24 13:04 Junnu-akhila

Hi @Junnu-akhila I realized. Thanks for correcting. It still fails, even if it does not fail, the solution is not what we are looking for. We cannot create env.yml files for each environment and use az command to create an environment using the env.yml files.

The build logs shows link to a job, which is : image

I am not sure why it doesn't build the env in workspace. For now, we have decided to use a shared ML registry for building environments. And interestingly, the same CI pipeline, dockerfile, and context, same code successfully builds the environment in the registry.

obiii avatar Apr 04 '24 15:04 obiii

Hi @obiii,

We tried, what you suggested, and we are able to create a ML Environment. As you said, now you are using ML registries for ML Environment, and it is working fine. Shall i close this? If you need anything we will work on this. Thank you.

Junnu-akhila avatar Apr 08 '24 18:04 Junnu-akhila