obstacle-tower-env icon indicating copy to clipboard operation
obstacle-tower-env copied to clipboard

Cannot launch more than 65 environments

Open Holt59 opened this issue 6 years ago • 5 comments

I tried to launch 100 environments I got a UnityTimeoutException when creating the 66th one. I checked multiple times and the exception always occurs on the 66th instantiation.

I am using gcloud with a K80 GPU and the memory usage is less than the available memory.

Holt59 avatar Mar 08 '19 10:03 Holt59

Hi @Holt59

One possibility is that for some reason the port is taken for that environment. We start with port 5005 (worker_id=0) and increment from there. I would suggest trying different worker ids.

If that doesn't seem to be the issue, another thing to try would be to add a wait time between launching the environment. We've gotten reports that when launching too many Unity processes concurrently errors like this can occur.

awjuliani avatar Mar 08 '19 18:03 awjuliani

@Holt59 you may be running out of GPU memory. I've only been able to run 2x16 locally (16 per gpu one is 1080 with 8gb other is a 1060 with 6gb). In the large scale curiosity paper they stated they where only able to get 40 unity environments running (I can't remember if it was a 4 or 8 gpu)

Also, I use a sec delay between launching each unity instance

Sohojoe avatar Mar 08 '19 20:03 Sohojoe

@awjuliani I've already checked the port, I'll try to add a delay between launch.

@Sohojoe I've a 12G K80 and I am only starting environment, no extra algorithms. And as I said, the GPU memory consumption (nvidia-smi) is nowhere near the its limit. I'll check the delay between the launch.

Holt59 avatar Mar 08 '19 21:03 Holt59

@Holt59 - did you get around this? I found that some ports are in use on my PC and so did a hardcoded hack to skip them

Sohojoe avatar Mar 28 '19 00:03 Sohojoe

@Sohojoe — I did not solve this but I did not look that much into it because I faced other ones... I checked the ports on my computer, and I had nothing running on these, so I don't think that was the issue.

Holt59 avatar Mar 28 '19 08:03 Holt59