Daniel Filan comments

Results 48 comments of


                                            Daniel Filan

[Feature request] Gradient accumulation in CrossEntropyRewardTrainer in preference_comparisons.py

Agree with above, and think that specifying the inner batch size makes more sense.

Test examples sometimes timeout

- In particular, do we know which cells in particular are taking too long to run? - In general, it seems like CI tests take way longer to run on...

Test examples sometimes timeout

TBC, linked example is from a branch where I'm testing some atari environments - I expect CI tests on master to be a bit quicker (but would be surprised if...

CNN reward functions

Currently in the process of adding tests to `test_reward_nets.py` that test the CnnRewardNet (as well as getting the example notebook runtime low enough that I stop getting CellTimeoutErrors on the...

CNN reward functions

I think by and large this is independent of @tomtseng's PR - biggest interaction is that it looks like CNNs should be wrapped by default.

CNN reward functions

Given discussion [here](https://github.com/HumanCompatibleAI/imitation/issues/486), will get CNNs to always transpose, rather than conditionally doing so.

CNN reward functions

Given discussion [here](https://github.com/HumanCompatibleAI/imitation/issues/486#issuecomment-1211183061), I've added a flag at the creation of the reward net to control transposition behaviour.

CNN reward functions

Not sure what's going on with code coverage, but I think this is ready for review by @norabelrose and/or @AdamGleave when he gets back.

CNN reward functions

> Probably better to avoid forks in the future if possible though. Yep - I think when I started the branch I wasn't able to make one in HumanCompatibleAI, but...

CNN reward functions

Added wrappers to atari environments to make them constant length. LMK if you think there are too many environments here, or if this should be in seals or something.