rl icon indicating copy to clipboard operation
rl copied to clipboard

[Feature Request] FrameSkipTransform - output every intermediate step?

Open maxweissenbacher opened this issue 1 year ago • 6 comments

Motivation

A feature request regarding the FrameSkipTransform (documentation here). When the frame_skip parameter is set to an integer greater than 1, i.e. frame_skip = 2, and I do a rollout in the resulting environment with num_steps steps, I receive num_steps observations. The underlying untransformed environment actually takes frame_skip * num_steps steps (with each action being repeated frame_skip times), but the transformed environment only outputs every frame_skip observation.

I find this be a bit counterintuitive. Personally, I would find it more natural if the transformed env would output the full frame_skip * num_steps steps. If say, num_steps =2, then the "action" entry in the TensorDict would then look something like [-0.1, -0.1, -0.15, -0.15, -0.2, -0.2, ...], i.e. it would show the repeated actions.

Solution

Is this doable with the current way Transforms work? If so, happy to contribute this. This could be an option in the FrameSkipTransform, allowing the user to choose between either outputting every step the env actually takes, or only every frame_skip step like it currently does.

Checklist

  • [x] I have checked that there is no similar issue in the repo (required)

maxweissenbacher avatar Oct 19 '23 14:10 maxweissenbacher

We could add a flag in the constructor to allow that, not a bad idea!

vmoens avatar Oct 19 '23 15:10 vmoens

What do you think should happen with batched environments? I don't think we want to keep applying the same action after reset (considering reset happens automatically). If we just return the last observation and the sum of rewards, we can simply not do anything with sub-envs that have reached a done, but it's harder to figure out what should happen if we need to return all the data. Maybe padded values? If massively parallel settings, partial rests will happen all the time so it's important that we provide a scalable solution to this.

vmoens avatar Oct 19 '23 15:10 vmoens

I hadn't thought about this. What happens now in the case of a batched env? If one sub_env reaches a done state say after frame_skip // 2 steps, this env gets reset and then takes frame_skip - frame_skip//2 more steps? I suppose padded values would make sense!

Another question is, realistically how big of a frame_skip will people typically want to use? Presumably, if frame_skip is high, the temporal resolution of the env is high, so padding (or extrapolating?) should usually give reasonable answers?

maxweissenbacher avatar Oct 19 '23 15:10 maxweissenbacher

I've thinking a bit about this transform and I think we should consider carefully what we want from it.

  • returning intermediate step makes total sense. What would it look like though? Currently, we stack the outputs that we get at every step along a new time dimension. Here, should we have a resulting time dimension that is T*S (where T is the time and S the frame skip)? That would be a nightmare to code but maybe more sensible from a UX perspective. The cheap option (my fav) would be to stack along a new frame_skip dim. This will give you actions of shape [B, T, A] (B=batch and A=feature dim of action), and obs/reward/done of shape [B, T, S, O] (where O is the obs dim).
  • what to do with resets puzzles me a bit. Currently we don't have the option of not calling step on sub envs like we have with reset (using the "_reset" keys). That would make sense eventually but it'll be a massive thing to code and could potentially impact the lib runtime on multiple levels. I wonder what other use cases we could find for it, maybe @matteobettini has ideas? The easiest thing ATM is to restrict this transform to simple envs and publicise the huge caveat that it is not supposed to work around batched envs but within them (ie, one copy of the transform per sub-env).

Once we clarify these points I'll be happy to get started refactoring the class!

vmoens avatar Nov 06 '23 13:11 vmoens

I have not seen partial steps yet. I do not really have any usecases in mind apart for the possibility of asyncronously step and reset parts of a batched environemnt (although at that point it is also possible to just instantiate separate environemnts).

Regarding the suggestion of the disclaimer to apply this transform to the sub-envs, I just wanted to say that this is only possible for non-vecotrized envs, as for vecotrized envs there is not really the concept of an independent sub env that could have an individual transform.

I am a but skeptical of coding something with the disclaimer it is not made for batched envs, as TorchRL's claim is that environments are batched by default (all envs have a batch_size) and this is the usecase that is currently envisioned for the future of RL. I would rather have somethng that works for all envs with documented behavior. E.g. for vecotrized envs if a reset happens within the skipping we could keep applying the action (or pad or whatever other strategy).

I am also curious on how the frameskip currently works for batched envs (given the absence of the partial step)

matteobettini avatar Nov 07 '23 09:11 matteobettini

That is exactly the problem, currently (and given what you're saying) this transform cannot work for batched envs unless they reset at the same time.

for vecotrized envs there is not really the concept of an independent sub env that could have an individual transform

Maybe not but we could envision indexing batched envs and only apply steps / resets to a subindex of them. Whether it's done in parallel or vectorized, if the envs are independent it should be possible to work only on a subset. We could even implement a __getitem__ in EnvBase. I'm not saying that it'll be easy but it would be possible.

I am a but skeptical of coding something with the disclaimer it is not made for batched envs

Currently the only alternative is to have nothing at all :)

vmoens avatar Nov 07 '23 13:11 vmoens