obstacle-tower-env How can I train an Agent on multiple Environments?

Hi,

I read the GCP tutorial on how to set up dopamine, but I cannot find out how to train the agent/brain on multiple environments simultaneously like you did during the PPO/Rainbow training.

Would it just be a matter of creating several runners?

Thanks a lot in advance,

Cheers Luc

Mar 01 '19 16:03 lagidigu

Hi @lagidigu

With Dopamine it is only possible to train using a single environment at a time. OpenAI Baselines on the other hand allows multiple environments concurrently for a number of their algorithms (including PPO). Here are instructions for running Unity environments with baselines: https://github.com/Unity-Technologies/ml-agents/tree/master/gym-unity#running-openai-baselines-algorithms. The difference is that you will want to replace UnityEnv with ObstacleTowerEnv.

Mar 01 '19 17:03 awjuliani

If used like seen below, the environment cannot establish a connection to Python. The socket somehow fails. This is tested on two Windows machines. Surprisingly, if set to one environment, two instances of the Obstacle Tower build are launched.

```python
from obstacle_tower_env import ObstacleTowerEnv
import sys
import argparse
from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv
from baselines.bench import Monitor
from baselines import logger
import baselines.ppo2.ppo2 as ppo2

import os

try:
    from mpi4py import MPI
except ImportError:
    MPI = None

def make_unity_env(env_filename, num_env, visual, start_index=1):
    """
    Create a wrapped, monitored Unity environment.
    """
    def make_env(rank, use_visual=True): # pylint: disable=C0111
        def _thunk():
            env = ObstacleTowerEnv(env_filename, retro=True, realtime_mode=True, worker_id=rank)
            env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
            return env
        return _thunk
    return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])

def main():
    env = make_unity_env('./ObstacleTower/obstacletower', 1, True)
    ppo2.learn(
        network="mlp",
        env=env,
        total_timesteps=100000,
        lr=1e-3,
    )

if __name__ == '__main__':
    main()

Mar 01 '19 22:03 MarcoMeter

Hi @lagidigu

With Dopamine it is only possible to train using a single environment at a time. OpenAI Baselines on the other hand allows multiple environments concurrently for a number of their algorithms (including PPO). Here are instructions for running Unity environments with baselines: https://github.com/Unity-Technologies/ml-agents/tree/master/gym-unity#running-openai-baselines-algorithms. The difference is that you will want to replace UnityEnv with ObstacleTowerEnv.

Thanks a lot!

Mar 04 '19 09:03 lagidigu