acme
acme copied to clipboard
Reverb Replay Buffers + Vectorized/Batched Environments
I'm trying to use a reverb replay buffer with a batched environment like 'envpool' where the api returns a batch of experience whenever the either .reset or .step is called.
I'm guessing there must be a better way to insert that data into the buffer than to have a writer for each individual environment and iterate over the writers adding their respective batch index of experience experience.
The below is clearly suboptimal and defeats the purpose of using a vectorized environment opposed to many workers executing a single environment.
num_envs = 100
envs = make_envs(num_envs)
writer = [client.writer() for _ in range(num_envs)]
obs = envs.reset()
# obs.shape == (100, 3, 86, 86) 100 atari obs
while True:
next_obs, reward, done, info = envs.step(action)
# next_obs.shape == (100, 3, 86, 86)
for i, writer in enumerate(writers):
writer.append({
'obs': obs[i],
.....
}
obs = next_obs
If there are any examples of working with batched environments and reverb in the codebase or if anyone could provide some direction, I'd greatly appreciate it.
I think currently creating multiple writers is the way to go as reverb does not provide a native way of doing batched append. There were some discussions about supporting batched environments. See https://github.com/deepmind/reverb/issues/52
If you want to instantiate multiple writers, there are some recommended setups for that which allows you to do this concurrently, see https://github.com/deepmind/reverb/issues/78
If you want to do use multiple workers, I think the recommended workflow is to use the launchpad library and the distributed experiment. You should be able to find some examples on how to do that.