stable-baselines icon indicating copy to clipboard operation
stable-baselines copied to clipboard

[Question] How best to implement self-play/multiple agents in the same environment?

Open brokenloop opened this issue 6 years ago • 5 comments

I'm trying to train a model using self play, and really love the work that has been done here so far. I was wondering whether anyone might have some advice about how I might adapt PPO2 to allow for multiple models to play against each other in the same environment.

The overall strategy would be to:

  • Store N models in a list
  • Generate an action from each of these models using a single observation
  • Generate a list of rewards for each of these actions from an environment
  • Update the models based on these rewards

I have written a custom environment that can take an array of actions, update the game state, and then return a list of rewards for each agent. My main issue is in prying apart the actual model from the interactions with the gym environment. I have been trying to decouple the model from the runner, but it seems as if they are quite tightly intertwined and I'm having a difficult time. Has anyone else played around with this idea before? Or be able to point me in the right direction?

brokenloop avatar Jan 31 '19 06:01 brokenloop

Hello,

I think @AdamGleave tackled that problem in the Adversarial policies repo, you should take a look ;)

araffin avatar Jun 15 '19 09:06 araffin

I never finished the self-play implementation but it might still be worth looking at:https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/agents/ppo_self_play.py

AdamGleave avatar Jun 16 '19 18:06 AdamGleave

I never finished the self-play implementation but it might still be worth looking at:https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/agents/ppo_self_play.py

@AdamGleave I can't access the page. Is there still an available/public version of it?

stefanbschneider avatar May 19 '20 15:05 stefanbschneider

Yeah it's still in the commit history.

https://github.com/HumanCompatibleAI/adversarial-policies/tree/99700aab22f99f8353dc74b0ddaf8e5861ff34a5/src/aprl/agents

AdamGleave avatar May 19 '20 21:05 AdamGleave

Here is an example for your reference. https://github.com/hardmaru/slimevolleygym

moliqingwa avatar Nov 02 '21 10:11 moliqingwa