stable-baselines
stable-baselines copied to clipboard
[Question] How best to implement self-play/multiple agents in the same environment?
I'm trying to train a model using self play, and really love the work that has been done here so far. I was wondering whether anyone might have some advice about how I might adapt PPO2 to allow for multiple models to play against each other in the same environment.
The overall strategy would be to:
- Store N models in a list
- Generate an action from each of these models using a single observation
- Generate a list of rewards for each of these actions from an environment
- Update the models based on these rewards
I have written a custom environment that can take an array of actions, update the game state, and then return a list of rewards for each agent. My main issue is in prying apart the actual model from the interactions with the gym environment. I have been trying to decouple the model from the runner, but it seems as if they are quite tightly intertwined and I'm having a difficult time. Has anyone else played around with this idea before? Or be able to point me in the right direction?
Hello,
I think @AdamGleave tackled that problem in the Adversarial policies repo, you should take a look ;)
I never finished the self-play implementation but it might still be worth looking at:https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/agents/ppo_self_play.py
I never finished the self-play implementation but it might still be worth looking at:https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/agents/ppo_self_play.py
@AdamGleave I can't access the page. Is there still an available/public version of it?
Yeah it's still in the commit history.
https://github.com/HumanCompatibleAI/adversarial-policies/tree/99700aab22f99f8353dc74b0ddaf8e5861ff34a5/src/aprl/agents
Here is an example for your reference. https://github.com/hardmaru/slimevolleygym