rl icon indicating copy to clipboard operation
rl copied to clipboard

[Feature] Added linear interpolation to FrameSkipTransform

Open Zekrom-7780 opened this issue 1 year ago • 6 comments

Description

  • Added an action_interp parameter to enable or disable the action interpolation feature.
  • Added an action_interp_buffer attribute to store the previous action for interpolation.

Motivation and Context

Solves Issue #1659

Types of changes

  • [x] New feature (non-breaking change which adds core functionality)
  • [x] Documentation (update in the documentation)

Checklist

  • [x] I have read the CONTRIBUTION guide (required)
  • [x] My change requires a change to the documentation.
  • [x] I have updated the documentation accordingly.

Reviewers

cc. @vmoens

P.S. I don't know if I have to add unit-tests for this, OR if this is a breaking change, so I haven't added them here.

Zekrom-7780 avatar Nov 10 '23 16:11 Zekrom-7780

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1692

Note: Links to docs will display an error until the docs builds have been completed.

:hourglass_flowing_sand: 8 Pending, 3 Unrelated Failures

As of commit ce0a846cbac298356bd45720c206f5598a055a3c with merge base 11562c765443990e2003653fd74be85c5f730d22 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot[bot] avatar Nov 10 '23 16:11 pytorch-bot[bot]

@vmoens, added the changes that you told me to.

Ready for another round of comments

would it make sense to store this contiguously (with a torch.stack around it)?

I wasn't sure what to do here, so I just wrapped interpolated_actions around a torch.stack and returned it.

Added action_key = "_action" as a parameter in the constructor, and now, will have to change the docstrings as well Is the action correct, or you wanted a different word, or a different way of doing it?

Thanks so much for this! We need to add a couple of tests for this feature. What should happen at reset time, if there is more than one env being run?

Here, Should I create a action_interp_buffer (let's call it that) for each environment, and like reset when that environment is called? But then how do I get the envirnoment like ID so as to know which one to reset?

Zekrom-7780 avatar Nov 13 '23 18:11 Zekrom-7780

@vmoens, added the changes that you told me to.

Ready for another round of comments

would it make sense to store this contiguously (with a torch.stack around it)?

I wasn't sure what to do here, so I just wrapped interpolated_actions around a torch.stack and returned it.

Added action_key = "_action" as a parameter in the constructor, and now, will have to change the docstrings as well Is the action correct, or you wanted a different word, or a different way of doing it?

Thanks so much for this! We need to add a couple of tests for this feature. What should happen at reset time, if there is more than one env being run?

Here, Should I create a action_interp_buffer (let's call it that) for each environment, and like reset when that environment is called? But then how do I get the envirnoment like ID so as to know which one to reset?

I'd make a step back and define precisely what we expect this to do. What I understand we want from that feature is to have a stack of interpolated actions, for instance:

  • if action at t=0 is act = [1] and action at t=1 with frame_skip=4 is act=[5] we want the interpolated action to be [[1], [2], [3], [4], [5]], is that right?
  • Maybe people will want to have the interpolated action and the action, so we should be able to pass the interpolated action as a separate key. Something like action = [5] and interpolated_action = [[1], [2], [3], [4], [5]]? I think overwriting the "action" entry by default is dangerous because the action you will see out of the transformed environment will be incompatible with the base environment, so the default should be a separate entry for the interpolated action. If people want, they can use the "action" key name in the constructor but that will be subject to bugs (from which the user is responsible since this is not the default).
  • The way we do the interpolation (in a buffer or in the tensordict) seems accessory for now, I would first code the feature and all the test, then we can change the inner mechanism (I'll be happy to help do not worry).
  • Do we allow any action type? Eg, categorical or one-hot? If not we should check that the action_spec of the parent env (transform.parent.full_action_spec[transform.action_key]) has a compatible type (ie, if of dtype float or smth like that).
  • To me the top-1 priority here is to write a series of tests. To do that, go in test_transforms.py and add one or more tests in TestFrameSkipTransform. Check that the resulting action has the right values by inputting two actions at two consecutive steps and check that the values stricly match what you're expecting. That way we will know that any change we make works within the bounds of what we want to do. Also test what happens at reset time. Let's work with a single environment for now (no batched envs) and we'll adapt that to batched envs later.

vmoens avatar Nov 15 '23 12:11 vmoens

@vmoens , I read your comments, and you mention that my first priority should be adding tests

So I looked at the test_transforms.py , and it has like 9000ish lines of code!! Where should I like write my tests? This file made me feel extremely overwhelmed!!

Zekrom-7780 avatar Nov 17 '23 23:11 Zekrom-7780

Ahah I can imagine! There's a TestFrameSkipTransform with all the frame-skip related stuff. You just need a couple of functions that test that if you create a frame stkip transform the results are what is expected

vmoens avatar Dec 01 '23 11:12 vmoens

Thanks a lot, will push some tests later, most probably in 24ish hours

Zekrom-7780 avatar Dec 01 '23 11:12 Zekrom-7780