poke-env
poke-env copied to clipboard
Selfplay?
Hi,
So given the changes to poke-env, the old way of doing selfplay (examples/experimental-self-play.py
) no longer works. I'm a little confused as to how I can do it now since there's no way of accepting a challenge in openai_api.py
.
I essentially just want to be able to battle two agents inheriting from EnvPlayer. If I understand @MatteoH2O1999's comments on #306 correctly, this isn't possible without linking this to a specific ML library. What would be some generic boilerplate code for this?
Hey @akashsara, in general you should have something like
challenge_task = asyncio.ensure_future(player1.agent.accept_challenges(player2.username, N_CHALLENGES))
for _ in range(N_CHALLENGES):
step = player1.reset()
while not step.done:
action = model1.action(state)
state = player1.step(action)
await challenge_task
on one thread and
challenge_task = asyncio.ensure_future(player2.agent.send_challenges(player1.username, N_CHALLENGES))
for _ in range(N_CHALLENGES):
step = player2.reset()
while not step.done:
action = model2.action(state)
state = player2.step(action)
await challenge_task
on another
Hi,
Can you guys add an example with this? It would be very helpful for those of us that don't have much coding skills yet would still like to enjoy making some agents :)
Thanks @MatteoH2O1999!
@mancho2000 this is what I ended up using for my implementation of self-play. Note that I coded my own implementation of DQN/PPO, so those bits might not match 1:1 with an existing library. It's also a bit different from what Matteo sent above since I wanted to train for N number of steps and not N battles but it shouldn't be too much work to change it if you want to:
async def battle_handler(player1, player2, num_challenges):
await asyncio.gather(
player1.agent.accept_challenges(player2.username, num_challenges),
player2.agent.send_challenges(player1.username, num_challenges),
)
def training_function(player, model, model_kwargs):
# Fit (train) model as necessary.
model.fit(player, **model_kwargs)
player.done_training = True
# Play out the remaining battles so both fit() functions complete
# We use 99 to give the agent an invalid option so it's forced
# to take a random legal action
while player.current_battle and not player.current_battle.finished:
_ = player.step(99)
if __name__ == "__main__":
...
player1 = SimpleRLPlayer(
battle_format="gen8randombattle",
log_level=30,
opponent="placeholder",
start_challenging=False,
)
player2 = SimpleRLPlayer(
battle_format="gen8randombattle",
log_level=30,
opponent="placeholder",
start_challenging=False,
)
...
# Self-Play bits
player1.done_training = False
player2.done_training = False
# Get event loop
loop = asyncio.get_event_loop()
# Make two threads: one per player and each runs model.fit()
t1 = Thread(target=lambda: training_function(player1, ppo, p1_env_kwargs))
t1.start()
t2 = Thread(target=lambda: training_function(player2, ppo, p2_env_kwargs))
t2.start()
# On the network side, keep sending & accepting battles
while not player1.done_training or not player2.done_training:
loop.run_until_complete(battle_handler(player1, player2, 1))
# Wait for thread completion
t1.join()
t2.join()
player1.close(purge=False)
player2.close(purge=False)
...