VectorizedMultiAgentSimulator icon indicating copy to clipboard operation
VectorizedMultiAgentSimulator copied to clipboard

Resetting only the vectorized environments that are done?

Open kfu02 opened this issue 1 year ago • 2 comments

Hi, sorry in advance if this isn't the right place to ask these kinds of questions.

I have been playing with VMAS in its vanilla form (no torchRL/RLLib) to try and understand how to implement my own Scenarios, and currently I am confused with how VMAS handles resetting the environment. The reset() function docstring states that it handles resetting "in a vectorized way". From my testing, it seems to me that it resets all vectorized environments.

I was hoping "in a vectorized way" meant that it only reset the environments which were done and left the others alone. I would like it to behave this way to collect episode reward from episodes that are allowed to run until termination, for instance. Does VMAS have this functionality built-in? Am I misunderstanding reset()?

Thank you for the great library, by the way!

kfu02 avatar Dec 26 '23 17:12 kfu02

Hello. Thanks for this question as this is a point I feel it is good to clarify and improve upon.

The current situation

Currently, as you say, there are 2 ways to reset an environment:

  • env.reset() which resets all enviornments
  • env.reset_at(index) which resets a specific environment at env_index: int

The way that is currently available to reset done environments is to cycle through the done flags and reset only the done envs as:

done # shape = [n_envs]
for i in range(n_envs):
    if done[i]:
         env.reset_at(i)

The ideal situation

To improve efficiency and avoid this for loop. It would be awsome if the reset_at function also accepted a mask.

Something like:

env.reset_at(done)

This would be amazing. The only problem is that the reset_at function of all current scenarios and a major bit of simulator logic will need to be rewritten. So it is not a quick or easy effort.

A consideration

What I do for some scenarios I create is to not implment a done function and let all environments be only done after max_steps. This makes it so that you can always call env.reset(). I understand that this does not fit all tasks, but I figured I would mention this in case it is helpful.

P.S. This change has long been on our TODOs https://github.com/proroklab/VectorizedMultiAgentSimulator?tab=readme-ov-file#todos

matteobettini avatar Dec 27 '23 11:12 matteobettini

Thank you! Your answer makes sense. I will think over these options.

kfu02 avatar Dec 27 '23 16:12 kfu02