seed_rl
seed_rl copied to clipboard
How to run an agent locally
After following the instructions and running the command ./run_local.sh dmlab vtrace 2
, I encounter a console with the following on it
root@985411f3a892:/seed_rl/docker# cat /tmp/seed_rl/instructions
Welcome to the SEED local training of dmlab with vtrace.
SEED uses tmux for easy navigation between different tasks involved
in the training process. To switch to a specific task, press CTRL+b, [tab id].
You can stop training at any time by executing '../stop_local.sh'
root@985411f3a892:/seed_rl/docker# python3 check_gpu.py 2> /dev/null
../stop_local.shroot@985411f3a892:/seed_rl/docker# ../stop_local.sh
What do I do next? How do I start the training and how do I monitor and evaluate the performance? Please help.
The training is started. As the description says, try Ctrl+b, [tab id] to switch to learner, actors, etc.
We use tmux to show separate logs for every actor and the learner.
Thanks. However I seem to be getting this error for my learner and actors. Can you tell me what is wrong?
Yes, sorry, I believe you ran into an issue I fixed two hours ago. Please try and pull the latest repo.
Thanks a lot. It seems to be running now. Can you please tell me how can I check the training progress?
Another thing I notice is that the training stops progressing. Can you tell me why this is happening?
@kaustabpal Your error message includes No CUDA-capable device detected
. So you might check if your device has a GPU with CUDA capabilities.
You can run nvidia-smi
in your terminal. If your output includes something like CUDA Version XX.XX you should be good to go.
Sample output with CUDA GPU:
Getting the below error when i run the ./run_local.sh dmlab vtrace 2
❯ ./run_local.sh dmlab vtrace 2
[+] Building 30.3s (6/13)
=> [internal] load build definition from Dockerfile.dmlab 0.0s
=> => transferring dockerfile: 3.02kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/tensorflow/tensorflow:2.4.1-gpu 2.3s
=> [internal] load build context 6.3s
=> => transferring context: 99.28MB 6.3s
=> CACHED [1/9] FROM docker.io/tensorflow/tensorflow:2.4.1-gpu@sha256:03e706e09b0425bb4f634a644c5d869f3b6d6c027411ccca14c18719121d3064 0.0s
=> ERROR [2/9] RUN apt-get update && apt-get install -y curl zip unzip software-properties-common pkg-config g++-4.8 zlib1g-dev lua5.1 liblua5.1-0-dev 27.9s
------
> [2/9] RUN apt-get update && apt-get install -y curl zip unzip software-properties-common pkg-config g++-4.8 zlib1g-dev lua5.1 liblua5.1-0-dev libffi-dev gettext freeglut3 libsdl2-dev libosmesa6-dev libglu1-mesa libglu1-mesa-dev python3-dev build-essential git python-setuptools python3-pip libjpeg-dev tmux:
#5 3.804 Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease [1581 B]
#5 4.151 Ign:2 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 InRelease
#5 4.172 Get:3 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release [564 B]
#5 4.199 Get:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release.gpg [833 B]
#5 4.303 Hit:5 http://archive.ubuntu.com/ubuntu bionic InRelease
#5 4.309 Get:6 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
#5 4.604 Get:7 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
#5 5.618 Get:8 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
#5 7.039 Err:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease
#5 7.039 The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
#5 8.727 Get:9 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Packages [73.8 kB]
#5 12.89 Get:10 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1533 kB]
#5 14.56 Get:11 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2310 kB]
#5 15.39 Get:12 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2937 kB]
#5 16.49 Get:13 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [22.8 kB]
#5 16.81 Get:14 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [1100 kB]
#5 16.95 Get:15 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [29.9 kB]
#5 17.23 Get:16 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [3369 kB]
#5 18.06 Get:17 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [1141 kB]
#5 18.46 Get:18 http://archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [12.9 kB]
#5 18.75 Get:19 http://archive.ubuntu.com/ubuntu bionic-backports/main amd64 Packages [12.2 kB]
#5 19.89 Reading package lists...
#5 27.83 W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
#5 27.83 E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is no longer signed.
------
executor failed running [/bin/bash -c apt-get update && apt-get install -y curl zip unzip software-properties-common pkg-config g++-4.8 zlib1g-dev lua5.1 liblua5.1-0-dev libffi-dev gettext freeglut3 libsdl2-dev libosmesa6-dev libglu1-mesa libglu1-mesa-dev python3-dev build-essential git python-setuptools python3-pip libjpeg-dev tmux]: exit code: 100