seed_rl icon indicating copy to clipboard operation
seed_rl copied to clipboard

How to run an agent locally

Open kaustabpal opened this issue 4 years ago • 7 comments

After following the instructions and running the command ./run_local.sh dmlab vtrace 2, I encounter a console with the following on it

root@985411f3a892:/seed_rl/docker# cat /tmp/seed_rl/instructions
Welcome to the SEED local training of dmlab with vtrace.
SEED uses tmux for easy navigation between different tasks involved
in the training process. To switch to a specific task, press CTRL+b, [tab id].
You can stop training at any time by executing '../stop_local.sh'
root@985411f3a892:/seed_rl/docker# python3 check_gpu.py 2> /dev/null
../stop_local.shroot@985411f3a892:/seed_rl/docker# ../stop_local.sh

What do I do next? How do I start the training and how do I monitor and evaluate the performance? Please help.

kaustabpal avatar Jul 07 '20 18:07 kaustabpal

The training is started. As the description says, try Ctrl+b, [tab id] to switch to learner, actors, etc.

We use tmux to show separate logs for every actor and the learner.

lespeholt avatar Jul 12 '20 08:07 lespeholt

Thanks. However I seem to be getting this error for my learner and actors. Can you tell me what is wrong? Screenshot from 2020-07-13 11-36-52

kaustabpal avatar Jul 13 '20 06:07 kaustabpal

Yes, sorry, I believe you ran into an issue I fixed two hours ago. Please try and pull the latest repo.

lespeholt avatar Jul 13 '20 19:07 lespeholt

Thanks a lot. It seems to be running now. Can you please tell me how can I check the training progress?

kaustabpal avatar Jul 13 '20 20:07 kaustabpal

Another thing I notice is that the training stops progressing. Can you tell me why this is happening? Screenshot from 2020-07-14 04-29-37

kaustabpal avatar Jul 13 '20 23:07 kaustabpal

@kaustabpal Your error message includes No CUDA-capable device detected. So you might check if your device has a GPU with CUDA capabilities. You can run nvidia-smi in your terminal. If your output includes something like CUDA Version XX.XX you should be good to go. Sample output with CUDA GPU: Kazam_screenshot_00052

Freshchris01 avatar Nov 14 '21 20:11 Freshchris01

Getting the below error when i run the ./run_local.sh dmlab vtrace 2

❯ ./run_local.sh dmlab vtrace 2
[+] Building 30.3s (6/13)                                                                                                                                                                           
 => [internal] load build definition from Dockerfile.dmlab                                                                                                                                     0.0s
 => => transferring dockerfile: 3.02kB                                                                                                                                                         0.0s
 => [internal] load .dockerignore                                                                                                                                                              0.0s
 => => transferring context: 2B                                                                                                                                                                0.0s
 => [internal] load metadata for docker.io/tensorflow/tensorflow:2.4.1-gpu                                                                                                                     2.3s
 => [internal] load build context                                                                                                                                                              6.3s
 => => transferring context: 99.28MB                                                                                                                                                           6.3s
 => CACHED [1/9] FROM docker.io/tensorflow/tensorflow:2.4.1-gpu@sha256:03e706e09b0425bb4f634a644c5d869f3b6d6c027411ccca14c18719121d3064                                                        0.0s
 => ERROR [2/9] RUN apt-get update && apt-get install -y     curl     zip     unzip     software-properties-common     pkg-config     g++-4.8     zlib1g-dev     lua5.1     liblua5.1-0-dev   27.9s
------                                                                                                                                                                                              
 > [2/9] RUN apt-get update && apt-get install -y     curl     zip     unzip     software-properties-common     pkg-config     g++-4.8     zlib1g-dev     lua5.1     liblua5.1-0-dev     libffi-dev     gettext     freeglut3     libsdl2-dev     libosmesa6-dev     libglu1-mesa     libglu1-mesa-dev     python3-dev     build-essential     git     python-setuptools     python3-pip     libjpeg-dev     tmux:
#5 3.804 Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease [1581 B]
#5 4.151 Ign:2 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
#5 4.172 Get:3 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release [564 B]
#5 4.199 Get:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release.gpg [833 B]
#5 4.303 Hit:5 http://archive.ubuntu.com/ubuntu bionic InRelease
#5 4.309 Get:6 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
#5 4.604 Get:7 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
#5 5.618 Get:8 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
#5 7.039 Err:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
#5 7.039   The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
#5 8.727 Get:9 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Packages [73.8 kB]
#5 12.89 Get:10 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1533 kB]
#5 14.56 Get:11 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2310 kB]
#5 15.39 Get:12 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2937 kB]
#5 16.49 Get:13 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [22.8 kB]
#5 16.81 Get:14 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [1100 kB]
#5 16.95 Get:15 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [29.9 kB]
#5 17.23 Get:16 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [3369 kB]
#5 18.06 Get:17 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [1141 kB]
#5 18.46 Get:18 http://archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [12.9 kB]
#5 18.75 Get:19 http://archive.ubuntu.com/ubuntu bionic-backports/main amd64 Packages [12.2 kB]
#5 19.89 Reading package lists...
#5 27.83 W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
#5 27.83 E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease' is no longer signed.
------
executor failed running [/bin/bash -c apt-get update && apt-get install -y     curl     zip     unzip     software-properties-common     pkg-config     g++-4.8     zlib1g-dev     lua5.1     liblua5.1-0-dev     libffi-dev     gettext     freeglut3     libsdl2-dev     libosmesa6-dev     libglu1-mesa     libglu1-mesa-dev     python3-dev     build-essential     git     python-setuptools     python3-pip     libjpeg-dev     tmux]: exit code: 100

ragyabraham avatar Aug 12 '22 08:08 ragyabraham