[question] EvalCallback using MPI
When running a training loop using MPI, the EvalCallback doesn't seem to make use of the parallelisation:
for example, in this train function:
https://github.com/hardmaru/slimevolleygym/blob/master/training_scripts/train_ppo_mpi.py
it seems that the EvalCallback will be called once per instance, after a combined total of eval_freq timesteps across all of the instances. This appears to be problematic if you want to use the callback decide whether to save out a new best model, as there will be multiple attempts at calculating the average reward and therefore the best_model file will be overwritten potentially several times on the same update. The best score will be naturally inflated the more instances you have, due to some reward calculations being slightly higher than average due to favourable random fluctuations.
Is this correct?
If so, is there a way to instead split the n_eval_episodes across the workers and aggregate into a single score?
Hello,
When running a training loop using MPI, the EvalCallback doesn't seem to make use of the parallelisation:
yes, the EvalCallback does not support MPI parallelization.
I would recommend you to try to switch to the VecEnv version of PPO (PPO2) if this is possible.
And even to switch to Stable-Baselines3 ;) : https://github.com/DLR-RM/stable-baselines3
If so, is there a way to instead split the n_eval_episodes across the workers and aggregate into a single score?
this is possible but non-trivial and would require you to define a custom callback (cf doc).
Hi @araffin - thanks for the reply!
Does VecEnv parallelise the gradient computation, or just the env part?
https://twitter.com/hardmaru/status/1260852988475658242
I've got PPO + MPI working really well on a multicore machine with a custom callback to handle the parallelisation of the evaluation. I'm also hesitant to switch to SB3 as it doesn't support Tensorflow, which is a shame.
Thanks for your help!
Does VecEnv parallelise the gradient computation, or just the env part?
Just the env part.
I've got PPO + MPI working really well on a multicore machine with a custom callback to handle the parallelisation of the evaluation.
SB3 do not support MPI by default, but we would be happy to have an implementation of PPO MPI in our contrib repo ;)
See https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/11
I'm also hesitant to switch to SB3 as it doesn't support Tensorflow, which is a shame.
The decision to move to PyTorch and drop MPI (for the default install) was not arbitrary, see https://github.com/hill-a/stable-baselines/issues/733 and https://github.com/hill-a/stable-baselines/issues/366 ;)
@araffin has anything changed with regards to SB3 supporting the MPI or its still not supported?
@araffin has anything changed with regards to SB3 supporting the MPI or its still not supported?
It is not (https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/11, https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/45), but contribution is welcomed ;)
But with SB3, you can use multiple envs for evaluation which provide a great speed up.