imitation
imitation copied to clipboard
CNN reward functions
(apologies for the bad branch name)
Includes:
- A new class of CNN reward functions
- A reward function wrapper to ensure inputs are always channel-first
- Modifications to normalization code to handle image data
- Installation of stable-baselines3[extra], to ensure training on atari with AtariPreprocessing works
- An ipython notebook illustrating the above
Currently in the process of adding tests to test_reward_nets.py
that test the CnnRewardNet (as well as getting the example notebook runtime low enough that I stop getting CellTimeoutErrors on the CI unit tests). Will also need to add some tests for the ChannelFirstRewardWrapper as per this comment.
Thanks for implementing this!
Feel free to request my review when finished, but I don't want to block this and will be largely unavailable Mon-Wed, so happy to defer to @norabelrose and @tomtseng on this.
Sounds like you might want to hash out a design with Tom first before pushing further on some parts of this? Although I expect a lot of it is independent of his PR (e.g. the more CNN centric components).
I think by and large this is independent of @tomtseng's PR - biggest interaction is that it looks like CNNs should be wrapped by default.
Given discussion here, will get CNNs to always transpose, rather than conditionally doing so.
Given discussion here, I've added a flag at the creation of the reward net to control transposition behaviour.
Codecov Report
Merging #519 (3fdc95b) into master (838ccca) will increase coverage by
0.11%
. The diff coverage is98.94%
.
@@ Coverage Diff @@
## master #519 +/- ##
==========================================
+ Coverage 96.97% 97.08% +0.11%
==========================================
Files 84 84
Lines 7494 7658 +164
==========================================
+ Hits 7267 7435 +168
+ Misses 227 223 -4
Impacted Files | Coverage Δ | |
---|---|---|
src/imitation/algorithms/base.py | 98.82% <ø> (ø) |
|
src/imitation/util/networks.py | 96.94% <91.30%> (-1.22%) |
:arrow_down: |
src/imitation/rewards/reward_nets.py | 100.00% <100.00%> (+2.85%) |
:arrow_up: |
src/imitation/scripts/common/train.py | 100.00% <100.00%> (ø) |
|
src/imitation/scripts/train_rl.py | 100.00% <100.00%> (ø) |
|
tests/rewards/test_reward_nets.py | 100.00% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Not sure what's going on with code coverage, but I think this is ready for review by @norabelrose and/or @AdamGleave when he gets back.
Thanks, will aim to review tomorrow (Friday) or Monday.
Hi, was trying to review this PR, but hard to read an ipynb from source. Tried to load it into vscode, but it seems that you worked on this from a fork, so it makes reviewing a bit cumbersome. Does anyone know if there a way to check out this branch without having to clone another repo just for this one PR? (Or, alternatively, if this is not too costly for you @dfilan, perhaps you could add your changes directly onto a branch on the CHAI repo).
@Rocamonde if you click "View file" you can get a more sane view of the notebook
It seems like there is a way to checkout branches from a fork, but I haven't tried it before
@Rocamonde if you click "View file" you can get a more sane view of the notebook
It seems like there is a way to checkout branches from a fork, but I haven't tried it before
Thanks for this, it actually makes a lot of sense. The remote origin you pull from is arbitrary & you can add multiple remotes. Probably better to avoid forks in the future if possible though.
Probably better to avoid forks in the future if possible though.
Yep - I think when I started the branch I wasn't able to make one in HumanCompatibleAI, but will do so in future.
Added wrappers to atari environments to make them constant length. LMK if you think there are too many environments here, or if this should be in seals or something.
Just realized that constant-length environments can be made much more simply.
Just want to flag that I just added a CNN potential function in reward_nets.py
, as well as initializing and testing potentials in test_reward_nets.py
(specifically in the _make_env_and_save_reward_net
function, which is fed to _is_reward_valid
).
Have modified CNNs to only accept image observations and discrete actions (when actions are used), and when actions are used to output one reward value per possible action rather than taking the action vector as a CNN input.
Looks like you've addressed most of my and Juan's suggestions -- please just request re-review from us once you've made all changes you want to in this PR.
@Rocamonde do you have any more comments? (If not I think we can merge.)
Oops - tests turned out to not actually cover the changes to the new net output, and there are some errors there. Lemme fix those.
OK, errors fixed. This involved minimal changes to the reward inference code.
Anybody know why codecov isn't returning a report?
Normally it waits until all tests pass to do so.
On Fri, Aug 26, 2022 at 12:21 AM, Daniel Filan < @.*** > wrote:
Anybody know why codecov isn't returning a report?
— Reply to this email directly, view it on GitHub ( https://github.com/HumanCompatibleAI/imitation/pull/519#issuecomment-1227848022 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/ABVWH356DTYO7PJIIGST5PDV275YDANCNFSM55XCNJSA ). You are receiving this because you were mentioned. Message ID: <HumanCompatibleAI/imitation/pull/519/c1227848022 @ github. com>
Normally it waits until all tests pass to do so.
But all other tests have passed.
Strange. It's late now here, but will take a look tomorrow morning if you haven't fixed it
On Fri, Aug 26, 2022 at 1:06 AM, Daniel Filan < @.*** > wrote:
Normally it waits until all tests pass to do so. … ( # ) On Fri, Aug 26, 2022 at 12:21 AM, Daniel Filan < @.*** > wrote: Anybody know why codecov isn't returning a report? — Reply to this email directly, view it on GitHub ( #519 (comment) ( https://github.com/HumanCompatibleAI/imitation/pull/519#issuecomment-1227848022 ) ) , or unsubscribe ( https:/ / github. com/ notifications/ unsubscribe-auth/ ABVWH356DTYO7PJIIGST5PDV275YDANCNFSM55XCNJSA ( https://github.com/notifications/unsubscribe-auth/ABVWH356DTYO7PJIIGST5PDV275YDANCNFSM55XCNJSA ) ). You are receiving this because you were mentioned. Message ID: < / pull/ 519 ( https://github.com/HumanCompatibleAI/imitation/pull/519 ) /c1227848022 @ github. com>
But all other tests have passed.
— Reply to this email directly, view it on GitHub ( https://github.com/HumanCompatibleAI/imitation/pull/519#issuecomment-1227871930 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/ABVWH34PE2Z5VGZ764E56CDV3ADBLANCNFSM55XCNJSA ). You are receiving this because you were mentioned. Message ID: <HumanCompatibleAI/imitation/pull/519/c1227871930 @ github. com>
I tried to look into this, but I'm not an imitation admin so I cannot inspect the codecov configuration to check if it's a permissions issue. They might have updated the integration hook and it might require re-configuring/re-installing.
On Fri, Aug 26, 2022 at 1:07 AM, Juan Rocamonde < @.*** > wrote:
Strange. It's late now here, but will take a look tomorrow morning if you haven't fixed it
On Fri, Aug 26, 2022 at 1:06 AM, Daniel Filan < notifications@ github. com ( @.*** ) > wrote:
Normally it waits until all tests pass to do so. … ( # ) On Fri, Aug 26, 2022 at 12:21 AM, Daniel Filan < @.*** > wrote: Anybody know why codecov isn't returning a report? — Reply to this email directly, view it on GitHub ( #519 (comment) ( https://github.com/HumanCompatibleAI/imitation/pull/519#issuecomment-1227848022 ) ) , or unsubscribe ( https:/ / github. com/ notifications/ unsubscribe-auth/ ABVWH356DTYO7PJIIGST5PDV275YDANCNFSM55XCNJSA ( https://github.com/notifications/unsubscribe-auth/ABVWH356DTYO7PJIIGST5PDV275YDANCNFSM55XCNJSA ) ). You are receiving this because you were mentioned. Message ID: < / pull/ 519 ( https://github.com/HumanCompatibleAI/imitation/pull/519 ) /c1227848022 @ github. com>
But all other tests have passed.
— Reply to this email directly, view it on GitHub ( https://github.com/HumanCompatibleAI/imitation/pull/519#issuecomment-1227871930 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/ABVWH34PE2Z5VGZ764E56CDV3ADBLANCNFSM55XCNJSA ). You are receiving this because you were mentioned. Message ID: <HumanCompatibleAI/imitation/pull/519/c1227871930 @ github. com>
https://codecov.io/gh/HumanCompatibleAI/imitation/commit/f6e2925f6342de8916009aae8370f1f4bb58b4c9/ is showing it, so it seems to have just not reported status to GitHub. I've made an empty commit to re-trigger it, will see if that works.
Looks like that worked.
Looks like that worked.
The reduction in test coverage at https://app.codecov.io/gh/HumanCompatibleAI/imitation/compare/519/changes seems to be driven by:
- forward of BasicPotentialCNN not being tested. We should probably add a test for that?
- Not testing the input validation of CnnRewardNet (does it raise an error if you throw in invalid non-image observation space etc). I don't feel strongly about whether or not this should be tested, though if easy perhaps worth doing so.
The code coverage is meant to be a prompt for us to look at what's going on rather than a hard requirement, so if the missing lines do not make sense to cover LMK and I can override this check.
The reduction in test coverage at https://app.codecov.io/gh/HumanCompatibleAI/imitation/compare/519/changes seems to be driven by:
* forward of BasicPotentialCNN not being tested. We should probably add a test for that?
Hmmm: I thought this was tested in _make_env_and_save_reward_net
and _is_reward_valid
and codecov was just confused by the saving/unsaving, but it looks like adding an assert False
to forward
doesn't actually cause tests to fail. Will deal with that.
* Not testing the input validation of CnnRewardNet (does it raise an error if you throw in invalid non-image observation space etc). I don't feel strongly about whether or not this should be tested, though if easy perhaps worth doing so.
IMO, this should probably not be tested. You could probably manipulate images or non-discrete actions to be inputted into a reward function CNN, and this wouldn't count as 'breaking' the codebase.
Turns out no potentials were being covered by tests, since the code was calling the predict
method of the ShapedRewardNet
s, which in turn called the base network's predict
, which called the base network's forward
- not the ShapedRewardNet
's forward
. Should have a fix up in a few minutes.