stable-baselines
stable-baselines copied to clipboard
Read errors when running PPO1 with MPI
When running PPO1 with MPI on my local computer, it works fine.
However, when running it on a remote server, I get the following warning on start:
[[42442,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: myhost
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
[myhost:00531] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[myhost:00531] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
And then, on every PPO update, I get a ton of errors:
[myhost:00537] Read -1, expected 45060, errno = 1
[myhost:00538] Read -1, expected 45064, errno = 1
[myhost:00541] Read -1, expected 45060, errno = 1
[myhost:00542] Read -1, expected 45064, errno = 1
[myhost:00539] Read -1, expected 45060, errno = 1
[myhost:00540] Read -1, expected 45064, errno = 1
[myhost:00543] Read -1, expected 45060, errno = 1
********** Iteration 0 ************
Optimizing...
pol_surr | pol_entpen | vf_loss | kl | ent
[myhost:00543] Read -1, expected 45060, errno = 1
[myhost:00537] Read -1, expected 45060, errno = 1
[myhost:00542] Read -1, expected 45064, errno = 1
[myhost:00539] Read -1, expected 45060, errno = 1
[myhost:00538] Read -1, expected 45064, errno = 1
How can I fix this?
Looks like some issues with MPI and outside stable-baselines, judging by the error messages. I suggest you try PPO2 version, which is more updated and does not require MPI to run (but uses GPU if one is available).
Yes, I can use PPO2.
One of the problems with it though, (and I was thinking of creating an issue to discuss this), is that the way multiprocessing is implemented is not very practical.
Take PPO1 with MPI for example. If I run 8 threads in an 8-core cpu, then cpu usage will most likely be around 100%.
However, when using PPO2, 8 threads would only use around 20% cpu power. I think this is because when using PPO2, the environments are fully synced, so if env1 ends its episode earlier, it will still have to wait for all of the other envs before starting a new episode. In my opinion, this a bit a clunky in cases where wall-clock time is all that matter. In my case, I end up having to use 64+ envs with PPO2 in order to fully utilize an 8-core cpu.
That is a good point, and possible addressed in stable-baselines3. This repository will not be getting bigger updates. A similar issue was brought up in #885.
Environments do not wait for others to end, rather all environments wait for one to reset
. If reset
takes long, then the training is slower, and asynchronous algorithms would speed up. You might be seeing low CPU usage because of CPU-GPU communication bottlenecking (if one is used), and indeed you usually need to launch many environments to fully utilize the hardware.
If you're using Docker, you can get rid of this message by including --cap_add=SYS_PTRACE
in the docker run