imitation icon indicating copy to clipboard operation
imitation copied to clipboard

CNN reward functions

Open dfilan opened this issue 2 years ago • 8 comments

(apologies for the bad branch name)

Includes:

  • A new class of CNN reward functions
  • A reward function wrapper to ensure inputs are always channel-first
  • Modifications to normalization code to handle image data
  • Installation of stable-baselines3[extra], to ensure training on atari with AtariPreprocessing works
  • An ipython notebook illustrating the above

dfilan avatar Aug 05 '22 19:08 dfilan

Currently in the process of adding tests to test_reward_nets.py that test the CnnRewardNet (as well as getting the example notebook runtime low enough that I stop getting CellTimeoutErrors on the CI unit tests). Will also need to add some tests for the ChannelFirstRewardWrapper as per this comment.

dfilan avatar Aug 05 '22 23:08 dfilan

Thanks for implementing this!

Feel free to request my review when finished, but I don't want to block this and will be largely unavailable Mon-Wed, so happy to defer to @norabelrose and @tomtseng on this.

Sounds like you might want to hash out a design with Tom first before pushing further on some parts of this? Although I expect a lot of it is independent of his PR (e.g. the more CNN centric components).

AdamGleave avatar Aug 05 '22 23:08 AdamGleave

I think by and large this is independent of @tomtseng's PR - biggest interaction is that it looks like CNNs should be wrapped by default.

dfilan avatar Aug 09 '22 21:08 dfilan

Given discussion here, will get CNNs to always transpose, rather than conditionally doing so.

dfilan avatar Aug 10 '22 01:08 dfilan

Given discussion here, I've added a flag at the creation of the reward net to control transposition behaviour.

dfilan avatar Aug 10 '22 21:08 dfilan

Codecov Report

Merging #519 (3fdc95b) into master (838ccca) will increase coverage by 0.11%. The diff coverage is 98.94%.

@@            Coverage Diff             @@
##           master     #519      +/-   ##
==========================================
+ Coverage   96.97%   97.08%   +0.11%     
==========================================
  Files          84       84              
  Lines        7494     7658     +164     
==========================================
+ Hits         7267     7435     +168     
+ Misses        227      223       -4     
Impacted Files Coverage Δ
src/imitation/algorithms/base.py 98.82% <ø> (ø)
src/imitation/util/networks.py 96.94% <91.30%> (-1.22%) :arrow_down:
src/imitation/rewards/reward_nets.py 100.00% <100.00%> (+2.85%) :arrow_up:
src/imitation/scripts/common/train.py 100.00% <100.00%> (ø)
src/imitation/scripts/train_rl.py 100.00% <100.00%> (ø)
tests/rewards/test_reward_nets.py 100.00% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Aug 10 '22 22:08 codecov[bot]

Not sure what's going on with code coverage, but I think this is ready for review by @norabelrose and/or @AdamGleave when he gets back.

dfilan avatar Aug 10 '22 23:08 dfilan

Thanks, will aim to review tomorrow (Friday) or Monday.

AdamGleave avatar Aug 11 '22 08:08 AdamGleave

Hi, was trying to review this PR, but hard to read an ipynb from source. Tried to load it into vscode, but it seems that you worked on this from a fork, so it makes reviewing a bit cumbersome. Does anyone know if there a way to check out this branch without having to clone another repo just for this one PR? (Or, alternatively, if this is not too costly for you @dfilan, perhaps you could add your changes directly onto a branch on the CHAI repo).

Rocamonde avatar Aug 15 '22 15:08 Rocamonde

@Rocamonde if you click "View file" you can get a more sane view of the notebook

Screen Shot 2022-08-15 at 11 24 26

It seems like there is a way to checkout branches from a fork, but I haven't tried it before

tomtseng avatar Aug 15 '22 15:08 tomtseng

@Rocamonde if you click "View file" you can get a more sane view of the notebook

Screen Shot 2022-08-15 at 11 24 26

It seems like there is a way to checkout branches from a fork, but I haven't tried it before

Thanks for this, it actually makes a lot of sense. The remote origin you pull from is arbitrary & you can add multiple remotes. Probably better to avoid forks in the future if possible though.

Rocamonde avatar Aug 15 '22 16:08 Rocamonde

Probably better to avoid forks in the future if possible though.

Yep - I think when I started the branch I wasn't able to make one in HumanCompatibleAI, but will do so in future.

dfilan avatar Aug 15 '22 19:08 dfilan

Added wrappers to atari environments to make them constant length. LMK if you think there are too many environments here, or if this should be in seals or something.

dfilan avatar Aug 16 '22 21:08 dfilan

Just realized that constant-length environments can be made much more simply.

dfilan avatar Aug 16 '22 22:08 dfilan

Just want to flag that I just added a CNN potential function in reward_nets.py, as well as initializing and testing potentials in test_reward_nets.py (specifically in the _make_env_and_save_reward_net function, which is fed to _is_reward_valid).

dfilan avatar Aug 19 '22 23:08 dfilan

Have modified CNNs to only accept image observations and discrete actions (when actions are used), and when actions are used to output one reward value per possible action rather than taking the action vector as a CNN input.

dfilan avatar Aug 22 '22 22:08 dfilan

Looks like you've addressed most of my and Juan's suggestions -- please just request re-review from us once you've made all changes you want to in this PR.

AdamGleave avatar Aug 23 '22 03:08 AdamGleave

@Rocamonde do you have any more comments? (If not I think we can merge.)

AdamGleave avatar Aug 24 '22 23:08 AdamGleave

Oops - tests turned out to not actually cover the changes to the new net output, and there are some errors there. Lemme fix those.

dfilan avatar Aug 25 '22 22:08 dfilan

OK, errors fixed. This involved minimal changes to the reward inference code.

dfilan avatar Aug 25 '22 23:08 dfilan

Anybody know why codecov isn't returning a report?

dfilan avatar Aug 25 '22 23:08 dfilan

Normally it waits until all tests pass to do so.

On Fri, Aug 26, 2022 at 12:21 AM, Daniel Filan < @.*** > wrote:

Anybody know why codecov isn't returning a report?

— Reply to this email directly, view it on GitHub ( https://github.com/HumanCompatibleAI/imitation/pull/519#issuecomment-1227848022 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/ABVWH356DTYO7PJIIGST5PDV275YDANCNFSM55XCNJSA ). You are receiving this because you were mentioned. Message ID: <HumanCompatibleAI/imitation/pull/519/c1227848022 @ github. com>

Rocamonde avatar Aug 26 '22 00:08 Rocamonde

Normally it waits until all tests pass to do so.

But all other tests have passed.

dfilan avatar Aug 26 '22 00:08 dfilan

Strange. It's late now here, but will take a look tomorrow morning if you haven't fixed it

On Fri, Aug 26, 2022 at 1:06 AM, Daniel Filan < @.*** > wrote:

Normally it waits until all tests pass to do so. … ( # ) On Fri, Aug 26, 2022 at 12:21 AM, Daniel Filan < @.*** > wrote: Anybody know why codecov isn't returning a report? — Reply to this email directly, view it on GitHub ( #519 (comment) ( https://github.com/HumanCompatibleAI/imitation/pull/519#issuecomment-1227848022 ) ) , or unsubscribe ( https:/ / github. com/ notifications/ unsubscribe-auth/ ABVWH356DTYO7PJIIGST5PDV275YDANCNFSM55XCNJSA ( https://github.com/notifications/unsubscribe-auth/ABVWH356DTYO7PJIIGST5PDV275YDANCNFSM55XCNJSA ) ). You are receiving this because you were mentioned. Message ID: < / pull/ 519 ( https://github.com/HumanCompatibleAI/imitation/pull/519 ) /c1227848022 @ github. com>

But all other tests have passed.

— Reply to this email directly, view it on GitHub ( https://github.com/HumanCompatibleAI/imitation/pull/519#issuecomment-1227871930 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/ABVWH34PE2Z5VGZ764E56CDV3ADBLANCNFSM55XCNJSA ). You are receiving this because you were mentioned. Message ID: <HumanCompatibleAI/imitation/pull/519/c1227871930 @ github. com>

Rocamonde avatar Aug 26 '22 00:08 Rocamonde

I tried to look into this, but I'm not an imitation admin so I cannot inspect the codecov configuration to check if it's a permissions issue. They might have updated the integration hook and it might require re-configuring/re-installing.

On Fri, Aug 26, 2022 at 1:07 AM, Juan Rocamonde < @.*** > wrote:

Strange. It's late now here, but will take a look tomorrow morning if you haven't fixed it

On Fri, Aug 26, 2022 at 1:06 AM, Daniel Filan < notifications@ github. com ( @.*** ) > wrote:

Normally it waits until all tests pass to do so. … ( # ) On Fri, Aug 26, 2022 at 12:21 AM, Daniel Filan < @.*** > wrote: Anybody know why codecov isn't returning a report? — Reply to this email directly, view it on GitHub ( #519 (comment) ( https://github.com/HumanCompatibleAI/imitation/pull/519#issuecomment-1227848022 ) ) , or unsubscribe ( https:/ / github. com/ notifications/ unsubscribe-auth/ ABVWH356DTYO7PJIIGST5PDV275YDANCNFSM55XCNJSA ( https://github.com/notifications/unsubscribe-auth/ABVWH356DTYO7PJIIGST5PDV275YDANCNFSM55XCNJSA ) ). You are receiving this because you were mentioned. Message ID: < / pull/ 519 ( https://github.com/HumanCompatibleAI/imitation/pull/519 ) /c1227848022 @ github. com>

But all other tests have passed.

— Reply to this email directly, view it on GitHub ( https://github.com/HumanCompatibleAI/imitation/pull/519#issuecomment-1227871930 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/ABVWH34PE2Z5VGZ764E56CDV3ADBLANCNFSM55XCNJSA ). You are receiving this because you were mentioned. Message ID: <HumanCompatibleAI/imitation/pull/519/c1227871930 @ github. com>

Rocamonde avatar Aug 26 '22 09:08 Rocamonde

https://codecov.io/gh/HumanCompatibleAI/imitation/commit/f6e2925f6342de8916009aae8370f1f4bb58b4c9/ is showing it, so it seems to have just not reported status to GitHub. I've made an empty commit to re-trigger it, will see if that works.

AdamGleave avatar Aug 26 '22 20:08 AdamGleave

Looks like that worked.

dfilan avatar Aug 26 '22 20:08 dfilan

Looks like that worked.

The reduction in test coverage at https://app.codecov.io/gh/HumanCompatibleAI/imitation/compare/519/changes seems to be driven by:

  • forward of BasicPotentialCNN not being tested. We should probably add a test for that?
  • Not testing the input validation of CnnRewardNet (does it raise an error if you throw in invalid non-image observation space etc). I don't feel strongly about whether or not this should be tested, though if easy perhaps worth doing so.

The code coverage is meant to be a prompt for us to look at what's going on rather than a hard requirement, so if the missing lines do not make sense to cover LMK and I can override this check.

AdamGleave avatar Aug 26 '22 21:08 AdamGleave

The reduction in test coverage at https://app.codecov.io/gh/HumanCompatibleAI/imitation/compare/519/changes seems to be driven by:

* forward of BasicPotentialCNN not being tested. We should probably add a test for that?

Hmmm: I thought this was tested in _make_env_and_save_reward_net and _is_reward_valid and codecov was just confused by the saving/unsaving, but it looks like adding an assert False to forward doesn't actually cause tests to fail. Will deal with that.

* Not testing the input validation of CnnRewardNet (does it raise an error if you throw in invalid non-image observation space etc). I don't feel strongly about whether or not this should be tested, though if easy perhaps worth doing so.

IMO, this should probably not be tested. You could probably manipulate images or non-discrete actions to be inputted into a reward function CNN, and this wouldn't count as 'breaking' the codebase.

dfilan avatar Aug 27 '22 00:08 dfilan

Turns out no potentials were being covered by tests, since the code was calling the predict method of the ShapedRewardNets, which in turn called the base network's predict, which called the base network's forward - not the ShapedRewardNet's forward. Should have a fix up in a few minutes.

dfilan avatar Aug 30 '22 00:08 dfilan