pymarl icon indicating copy to clipboard operation
pymarl copied to clipboard

The problem when to run experiments using the Docker container

Open Aaricis opened this issue 3 years ago • 8 comments

I met the problem when run bash run.sh $GPU python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=2s3z.

Launching container named 'zkg_pymarl_GPU_python3_XIdE' on GPU 'python3' docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: device error: python3: unknown device: unknown. ERRO[0001] error waiting for container: context canceled

I don't know how to solve this error.

Aaricis avatar Dec 18 '20 08:12 Aaricis

same problem here

mw9385 avatar Jan 14 '21 09:01 mw9385

Hello, you can refer to this issue: #89. I posted my solution in the comments.

reubenwong97 avatar Jan 21 '21 04:01 reubenwong97

@reubenwong97 many thanks :)

mw9385 avatar Jan 21 '21 04:01 mw9385

This issue caused by '$GPU', use some numbers (like: '0', '1') can make this shell work. However, the CUDA can not work with the docker (nvidia-docker has been installed) Any ideas of the '$GPU'? Thanks!

btw, CUDA can work when I run python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=2s3z (do not use docker)

4ever-Rain avatar Feb 23 '21 03:02 4ever-Rain

@4ever-Rain you may wanna run nvidia-smi to first, check the ids of your GPUs, which you can use in place of $GPU. I had experienced problems when the GPU was low on memory due to other tasks running. You can check if it has available memory with nvidia-smi.

reubenwong97 avatar Feb 24 '21 12:02 reubenwong97

@reubenwong97 Thanks for your advice. I'm sure my GPU is available and free. I have used GPU ids ('0') instead of '$GPU'. But CUDA still not work within docker. Meanwhile,I'm sure torch.cuda_is_available is True in the docker. Maybe there is something wrong about 'run.sh'?

4ever-Rain avatar Feb 25 '21 03:02 4ever-Rain

@4ever-Rain I encounter the same problem. When I use the cuda() in the container, it will get stuck and can not work. Do you have solved the problem?

FanScy avatar Mar 15 '21 14:03 FanScy

@4ever-Rain I encounter the same problem. When I use the cuda() in the container, it will get stuck and can not work. Do you have solved the problem?

Yep. It works for me now. I finally run the code out of the container by installing all the necessary packages into a conda virtual environment.

4ever-Rain avatar Mar 17 '21 09:03 4ever-Rain