clearml-agent icon indicating copy to clipboard operation
clearml-agent copied to clipboard

Parameters in clearml.conf not used during task run

Open jax79sg opened this issue 3 years ago • 1 comments

Hi,

I started my agent using. clearml-agent daemon --gpus 0 --queue gpu --docker --foreground, with the following parameters in clearml.conf.


    default_docker: {
        # default docker image to use when running in docker mode
        image: "dockerrepo/mydocker:custom"

        # optional arguments to pass to docker image
        # arguments: ["--ipc=host", ]
        arguments: ["--env GIT_SSL_NO_VERIFY=true",]
    }

Then this is shown while waiting for tasks.

Worker "master-node:gpu0" - Listening to queues:
+----------------------------------+------+-------+
| id                               | name | tags  |
+----------------------------------+------+-------+
| 943fce37803044ef89f6d9af0cd5279c | gpu  |       |
+----------------------------------+------+-------+

Running in Docker  mode (v19.03 and above) - using default docker image: dockerrepo/mydocker:custom running python3

So far so good except that when a task is pulled, i get this as output. If you noticed, first the docker image is reverted to nvidia/cuda:10.1-runtime-ubuntu18.04, and there's no indication that the arg --env is passed on.

task 228caa5d25d94ac5aa10fa7e1d02f03c pulled from 943fce37803044ef89f6d9af0cd5279c by worker master-node:gpu0
Running task '228caa5d25d94ac5aa10fa7e1d02f03c'
Storing stdout and stderr log to '/tmp/.clearml_agent_out.xmqr15w5.txt', '/tmp/.clearml_agent_out.xmqr15w5.txt'
Running Task 228caa5d25d94ac5aa10fa7e1d02f03c inside docker: nvidia/cuda:10.1-runtime-ubuntu18.04
Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', '-e', 'CLEARML_WORKER_ID=master-node:gpu0', '-e', 'CLEARML_DOCKER_IMAGE=nvidia/cuda:10.1-runtime-ubuntu18.04', '-v', '/home/jax/.gitconfig:/root/.gitconfig', '-v', '/tmp/.clearml_agent.txivbuei.cfg:/root/clearml.conf', '-v', '/tmp/clearml_agent.ssh.04t66_qn:/root/.ssh', '-v', '/home/jax/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/jax/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/jax/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/jax/.clearml/cache:/clearml_agent_cache', '-v', '/home/jax/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', 'nvidia/cuda:10.1-runtime-ubuntu18.04', 'bash', '-c', 'echo \'Binary::apt::APT::Keep-Downloaded-Packages "true";\' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; apt-get update ; apt-get install -y git libsm6 libxext6 libxrender-dev libglib2.0-0 ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || apt-get install -y python3-pip ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring  --id 228caa5d25d94ac5aa10fa7e1d02f03c']

jax79sg avatar Mar 01 '21 04:03 jax79sg

Hi @jax79sg ,

Check if you have a docker image configure in your task (you can view it in the UI under BASE DOCKER IMAGE). If so, it will use it for docker image and parameters.

The ClearML agent will pick the image and parameters from the task, if this section is empty in the task, then it will run with the docker image and other parameters configure in your ~/clearml.conf file.

JDennisJ avatar Mar 01 '21 07:03 JDennisJ