Mava icon indicating copy to clipboard operation
Mava copied to clipboard

[FEATURE] Single Process Mava Systems

Open jcformanek opened this issue 2 years ago • 0 comments

Feature

Add support to run Mava systems as a single process program rather than a distributed program using Launchpad. It is very hard to debug Mava systems because of Launchpad. One ends up needing to use print statements to figure out what is going wrong with your system because debuggers like the VSCode debugger don't work with Launchpad. If systems could easily be run on a single process then one could use the debugger to speed up development.

Proposal

We can add a method to each of the system classes that will run the system as a single process program. The method should work something like this:

def run_single_proc_system(self, training_steps_per_episode = 4):
        
        replay_tables = self.replay()
        replay_server = reverb.Server(tables=replay_tables)
        replay_client = reverb.Client(f'localhost:{replay_server.port}')

        trainer = self.trainer(replay_client)

        executor = self.executor(replay_client, trainer)

        episode = 0
        while True:

            episode += 1

            executor.run_episode()

            if episode >= self._min_replay_size:
                for _ in range(training_steps_per_episode):
                    trainer.step()

Testing

Having systems run as a single process should make testing them a lot easier too.

  • The single process systems will each need to be tested.
  • One will need to test that each component in [replay, executor, evaluator, trainer] is instantiated correctly and working as expected.

Benchmarking (Optional)

The single process systems will all need to be benchmarked to verify that they train properly, albeit slower than the distributed programs.

  • Performance (Episode Return)
  • Speed

Definition of done

This issue is done when all mava systems can be run in single process mode.

Mandatory checklist before making a PR

  • [ ] MADQN supports single process
  • [ ] MADDPG supports single process
  • [ ] Value Decomposition systems supports single process
  • [ ] PPO supports single process
  • [ ] All tests are implemented

jcformanek avatar Mar 03 '22 08:03 jcformanek