How can I train an Agent on multiple Environments?
Hi,
I read the GCP tutorial on how to set up dopamine, but I cannot find out how to train the agent/brain on multiple environments simultaneously like you did during the PPO/Rainbow training.
Would it just be a matter of creating several runners?
Thanks a lot in advance,
Cheers Luc
Hi @lagidigu
With Dopamine it is only possible to train using a single environment at a time. OpenAI Baselines on the other hand allows multiple environments concurrently for a number of their algorithms (including PPO). Here are instructions for running Unity environments with baselines: https://github.com/Unity-Technologies/ml-agents/tree/master/gym-unity#running-openai-baselines-algorithms. The difference is that you will want to replace UnityEnv with ObstacleTowerEnv.
If used like seen below, the environment cannot establish a connection to Python. The socket somehow fails. This is tested on two Windows machines. Surprisingly, if set to one environment, two instances of the Obstacle Tower build are launched.
```python
from obstacle_tower_env import ObstacleTowerEnv
import sys
import argparse
from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv
from baselines.bench import Monitor
from baselines import logger
import baselines.ppo2.ppo2 as ppo2
import os
try:
from mpi4py import MPI
except ImportError:
MPI = None
def make_unity_env(env_filename, num_env, visual, start_index=1):
"""
Create a wrapped, monitored Unity environment.
"""
def make_env(rank, use_visual=True): # pylint: disable=C0111
def _thunk():
env = ObstacleTowerEnv(env_filename, retro=True, realtime_mode=True, worker_id=rank)
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
return env
return _thunk
return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
def main():
env = make_unity_env('./ObstacleTower/obstacletower', 1, True)
ppo2.learn(
network="mlp",
env=env,
total_timesteps=100000,
lr=1e-3,
)
if __name__ == '__main__':
main()
Hi @lagidigu
With Dopamine it is only possible to train using a single environment at a time. OpenAI Baselines on the other hand allows multiple environments concurrently for a number of their algorithms (including PPO). Here are instructions for running Unity environments with baselines: https://github.com/Unity-Technologies/ml-agents/tree/master/gym-unity#running-openai-baselines-algorithms. The difference is that you will want to replace
UnityEnvwithObstacleTowerEnv.
Thanks a lot!