poke-env
poke-env copied to clipboard
Enable true concurrent battles
Currently, it is not possible to have concurrent battles. Although you can specify max_concurrent_battles
, what it does (as far as I can tell) is spawn multiple battles but only allow you to call choose_move
on them one by one. This greatly limits the speed of training as it is much more efficient to run many environments in parallel and batch the inputs than to process them one by one.
Therefore, I propose adding a new choose_moves
function that returns a list of battle
s, as well as modifying the necessary source code to enable (true) simultaneous battles. This function should ideally not fix the length of the list (but allow for a maximum length), returning all the already processed battles when called to prevent bottlenecking.
Hey @carbonbased-lifeform,
Are you referring to the gym API or the general API?
General. Does it exist in the gym API? I haven't really used that one yet.
In the general API, most of the time the bottleneck is not on poke-env
's side, but on pokemon showdown's side. Processing batches of battles on poke-env
's side would not make things faster - it would actually probably make things slower, as you would have to wait to have the state of batch_size
battles ready to receive an order to process them.
In the gym API, your bottleneck might be on the model side, where batching can be helpful. I'm aware of a couple of implementations that implement this, but they also have to deal with tweaking their state / action / rewards tuple management to make training possible. I do not think this is suitable for the basic API, but it might be added to the docs as an advanced example.
With the general API, making SimpleHeuristicPlayers
play randbats I can play a bit more than 20 games per second on my couple of years old laptop. I'm sure you could get these numbers up with a better setup.
it would actually probably make things slower, as you would have to wait to have the state of batch_size battles ready to receive an order to process them.
As mentioned in the original issue, I proposed that showdown returns all the battles it has already processed. Even if batch size is not reached, we can simply take what games are already processed and run the model on them (this will benefit from finding a balance of course), since there are little complications associated with varying batch sizes on inference (unlike training). This ensures that the device with the model (which is more often than not the bottleneck) is always running at full capacity. Batching inputs is still vastly more efficient even with the concern you raised, a small test shows that a batch size of 1 takes 0.1 seconds whereas a batch size of 128 takes around .5 seconds on a relatively large model, thus my suggestion.
A bit of a nooby question since I haven't tried the gym API yet, but does the gym API run concurrent battles in the background as described above and thus allow batching?
Your remakes make sense w.r.t. running RL training, which is not what the general API is focused on (as opposed to the Gym API). The out-of-the-box implementation of the gym API is focused on running a single battle at a time, but you can spawn multiple instances and organize your training flow to benefit from parallelization, including with batching. This is what the implementations I was referring to are doing.
Thanks a lot for the replies. I'll try out the gym API when I have the time. One more question (hope it's not a stupid one), is there anyway to modify the player_network_interface
to be able to run concurrent battles? Given that it is possible to send messages from a single player to multiple rooms, I don't see why it's not possible to implement the following for concurrency since naïve parallelization using multiple players seems rather inefficient.
max_concurrent_battles
determines how many battles will be run at the same time by the Player
object, and relies on asyncio
. If you want to implement some kind of batching logic, as you were hinting towards, I'd recommend doing so directly in choose_move
- everything else is doing parsing and / or state tracking, which wouldn't really benefit from parallelization.
In the gym API, your bottleneck might be on the model side, where batching can be helpful. I'm aware of a couple of implementations that implement this, but they also have to deal with tweaking their state / action / rewards tuple management to make training possible. I do not think this is suitable for the basic API, but it might be added to the docs as an advanced example.
Hi @hsahovic could you point me to some implementations like the ones you mentioned here? I'm in a situation where I'd like to run a large number of battles (1M+) and would like to be able to speed things up by running multiple battles in parallel.
My best suggestion is to get a fast CPU and try running it in the cloud. My thing is only able to run 200k battles in 15 hours and would take about 75 hours for one million. Maybe less if I increase the amount from just 100 at a time.
You will also run into more issues with things like Revival blessing which was a tricky one to fix.
In the gym API, your bottleneck might be on the model side, where batching can be helpful. I'm aware of a couple of implementations that implement this, but they also have to deal with tweaking their state / action / rewards tuple management to make training possible. I do not think this is suitable for the basic API, but it might be added to the docs as an advanced example.
Hi @hsahovic could you point me to some implementations like the ones you mentioned here? I'm in a situation where I'd like to run a large number of battles (1M+) and would like to be able to speed things up by running multiple battles in parallel.
Also You could either look into Torch RL and Remote RPC that I had to impliment personally.