Fixes #461.

[x] Support seeding via generator
[x] Implement base class update params
[ ] Add regularization param change to logger
[x] Add loss_regularize
[x] Add weight_regularize

Jul 20 '22 17:07 Rocamonde

Heads up we'll probably have some conflicts once https://github.com/HumanCompatibleAI/imitation/pull/460 gets merged, although shouldn't be too hard to resolve.

Jul 21 '22 21:07 AdamGleave

(pulled all commits from #460 so that we can start integrating that into this issue)

Jul 24 '22 23:07 Rocamonde

Codecov Report

Merging #481 (8d0b0a9) into master (91c66b7) will increase coverage by 0.10%. The diff coverage is 99.47%.

@@            Coverage Diff             @@
##           master     #481      +/-   ##
==========================================
+ Coverage   97.16%   97.26%   +0.10%     
==========================================
  Files          85       88       +3     
  Lines        7646     8002     +356     
==========================================
+ Hits         7429     7783     +354     
- Misses        217      219       +2

Impacted Files	Coverage Δ
src/imitation/regularization/updaters.py	`97.61% <97.61%> (ø)`
src/imitation/algorithms/preference_comparisons.py	`99.04% <98.48%> (-0.13%)`	:arrow_down:
src/imitation/regularization/regularizers.py	`100.00% <100.00%> (ø)`
tests/algorithms/test_preference_comparisons.py	`100.00% <100.00%> (ø)`
tests/test_regularization.py	`100.00% <100.00%> (ø)`

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

Jul 24 '22 23:07 codecov[bot]

Added initial skeleton for the regularization module. Let me know your thoughts. Seems like weight decay has to act after loss.backward(), but L2 regularization / arbitrary loss penalties have to go before. So I moved the call to loss.backward() inside the regularizer, and replaced that call with .regularize(loss). Then each regularizer implementation computes the final loss and final gradients accordingly.

Jul 25 '22 09:07 Rocamonde

I've changed the base to reward_ensemble so the diff is cleaner. Once reward_ensemble gets merged we can rebase to master. We shouldn't merge this PR before that point.

Jul 25 '22 18:07 AdamGleave

Sounds good, thanks for doing that. I suspected it was possible but wasn't sure how to.

On Mon, Jul 25, 2022 at 7:39 PM, Adam Gleave < @.*** > wrote:

I've changed the base to reward_ensemble so the diff is cleaner. Once reward_ensemble gets merged we can rebase to master. We shouldn't merge this PR before that point.

— Reply to this email directly, view it on GitHub ( https://github.com/HumanCompatibleAI/imitation/pull/481#issuecomment-1194464889 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/ABVWH3YHHSSMQ77WIFUFLG3VV3NNZANCNFSM54ERHGEA ). You are receiving this because you authored the thread. Message ID: <HumanCompatibleAI/imitation/pull/481/c1194464889 @ github. com>

Jul 25 '22 18:07 Rocamonde

Right now Regularize seems to be doing two different things:

Adaptive regularization via update_params, implemented in WeightDecayRegularizer.

Support for regularizing with a variable scale factor, implemented in regularize.

I'm inclined to split these out.

Agreed, that’s what I had in mind from the beginning, but since I wasn’t sure how to implement the abstraction exactly I hard-coded the update_params function. The idea was to not make it an abstract method and define it in the ABC, where the parameters would be updated by calling a callable passed to __init__(). But wasn’t sure how general to make the type signature of this callable so I wrote a sketch of a specific method first to see how that looked.

Jul 25 '22 19:07 Rocamonde

I guess the uncertainty came from the following points:

Should all regularization parameter updates only take in validation and training loss? Is there anything else that we should include? I think that the idea of the methods we are thinking about is dynamically updating the regularization params based on validation loss, but maybe more general regularization techniques follow completely different approaches. I am inclined to keep it limited to this for now, but worth giving it a think.
Will our regularization techniques only have one parameter or potentially multiple parameters that get updated? If there are multiple parameters, we would have to iterate. But then we might want to keep different tuning hyperparameters separate for each regularization parameter (e.g. the update rate or the initial value). I don't want to introduce unnecessary abstraction to support this if we are only ever likely to support regularization techniques that have only one parameter. Also, if there are multiple parameters, I am inclined to constrain them to be separately "updateable" and iterate over the update loop within the ABC, but maybe there is a use case for updating them simultaneously?
Should the regularization methods be composable? i.e. easy to apply multiple simultaneously, or to add e.g. L2 and L1 loss by just adding/stacking them in some way, rather than having to create a separate method. I am again inclined to not allow this for now, as I have the impression that this would introduce a lot of abstraction that might go unused.

WDYT?

Jul 25 '22 20:07 Rocamonde

Two other things that come to mind:

Do we want to update the parameters after each batch or after the whole epoch? Currently I implemented the latter, and I take a cumulative validation loss that is what gets passed every time.
Should we include the hyperparams and final params in the logger / serialization dump? Seems important for reproducibility.

Jul 25 '22 20:07 Rocamonde

I guess the uncertainty came from the following points:

I think it's fine to have adaptive regularization take in only validation and training loss. We can add new use cases later down the line as needed. It's going to be very hard to support all possible use cases up front and I don't think we should try.

Will our regularization techniques only have one parameter or potentially multiple parameters that get updated? If there are multiple parameters, we would have to iterate. But then we might want to keep different tuning hyperparameters separate for each regularization parameter (e.g. the update rate or the initial value). I don't want to introduce unnecessary abstraction to support this if we are only ever likely to support regularization techniques that have only one parameter. Also, if there are multiple parameters, I am inclined to constrain them to be separately "updateable" and iterate over the update loop within the ABC, but maybe there is a use case for updating them simultaneously?

Think fine to keep it to only support a single parameter for now. We can add support for multiple parameters without breaking backward compatibility, so no need to engineer it in now.

Should the regularization methods be composable? i.e. easy to apply multiple simultaneously, or to add e.g. L2 and L1 loss by just adding/stacking them in some way, rather than having to create a separate method. I am again inclined to not allow this for now, as I have the impression that this would introduce a lot of abstraction that might go unused.

Hm good question. I struggle to see a case where we'd really want multiple regularization techniques. But the way it's currently architected might make it hard to add support for this later. I think we could add support for this by methods defining some loss_regularize (that takes in a loss and does some transformation to it, e.g. adding a penalty term) and weight_regularize (that updates the parameter groups in self.optimizer directly ala for weight decay), then having some class that lets us chain these together. But this seems like it'd add a fair bit of implementation complexity, so I'm inclined to agree with you that it's not worth supporting right now.

Do we want to update the parameters after each batch or after the whole epoch? Currently I implemented the latter, and I take a cumulative validation loss that is what gets passed every time.

Updating it once per epoch seems reasonable for now. However, if we had very big datasets we might want to update it more than once per epoch, so ideally this would be a design decision that wouldn't be too hard to change.

Should we include the hyperparams and final params in the logger / serialization dump? Seems important for reproducibility.

Definitely worth logging these regularization parameters over time: it'll be a useful metric to track.

I don't right now see the case for including them in the serialized policy / reward network. For restarting training from a checkpoint, it'd be useful to have the final params (history not required). But we largely don't support checkpointing now (we lose e.g. the momentum parameters from Adam) -- it'd be a feature we'd like to have in the future, but no need to complexify this PR with that.

Jul 26 '22 17:07 AdamGleave

Apart from adding logging, I think this is now implementing all the required features for an initial version. Only missing adding docstrings etc. pytype does not notice, but I get an error from Pylance saying that:

Argument of type "Subset[Unknown]" (train_dataset) cannot be assigned to 
parameter "dataset" of type "PreferenceDataset" in function "_make_data_loader"
  "Subset[Unknown]" is incompatible with "PreferenceDataset"

Which is technically correct, the types Subset and PreferenceDataset are both subtypes of DataSet, but not subtypes of each other. In fact, _make_data_loader could use a more general type afaiu. Should I change this?

Jul 31 '22 01:07 Rocamonde

I have no idea what is going on here, I am getting errors on CircleCI that seem so weird and don't show up on my local machine. Before I was getting an indentation error, now I'm getting a sphinx error, neither of which were caught by ci/code_checks.sh.

Aug 16 '22 18:08 Rocamonde

I have no idea what is going on here, I am getting errors on CircleCI that seem so weird and don't show up on my local machine. Before I was getting an indentation error, now I'm getting a sphinx error, neither of which were caught by ci/code_checks.sh.

Probably a version difference. Does pip freeze on your machine match up with "print installed package" output on CircleCI? Could also try "rerunning job with SSH" on CircleCI, SSHing into the machine, and playing around with it?

We do need CI to pass before merge, but I'll start reviewing without.

Aug 18 '22 23:08 AdamGleave

I wonder what actually happens to the loss gradient when adding an L1 norm penalty, since it's not differentiable. Does pytorch compute subgradients? @AdamGleave

Aug 22 '22 22:08 Rocamonde

Added fixes for all the comments, except for moving the file to utils, still thinking about whether that makes sense. I agree that a separate package is overkill at the moment, but maybe it makes sense as a root module on imitation.regularizers instead or similar.

Still need to fix type issues, and add tests.

Aug 22 '22 23:08 Rocamonde

Thanks for the updates!

Moving regularization/__init__.py to some sub-module in regularization seems fine to me as an alternative to util (or utils...). Main question is whether we expect to add multiple things to the regularization package? Introducing a new package seems overkill for a single module, but if we expect to add things to the regularization package in the future (or even split up the current module into multiple ones) then it'd be good to have.

Aug 23 '22 03:08 AdamGleave

I wonder what actually happens to the loss gradient when adding an L1 norm penalty, since it's not differentiable. Does pytorch compute subgradients? @AdamGleave

Yeah, it uses subgradients at non-differentiable points. For added chaos, different implementations of the same mathematical function can return different subgradients -- e.g. https://github.com/pytorch/pytorch/issues/7172

Aug 23 '22 03:08 AdamGleave

I have just realized that the regularization parameter updates in preference comparisons on the reward trainer are called within the outer training loop of the preference comparisons class. Therefore, we are only dumping the final and the mean values of the regularization parameter, instead of the value at each step. This is only specific to the preference comparisons implementation and is not an issue with general usage of regularization, but it would be good to find a way to dump the data more granularly for the reward trainer.

AFAIK .dump() takes an integer step that corresponds to the PreferenceComparsions iteration, so overloading this with the step of the trainer would probably result in data overridden. In my estimation it seems that allowing for this feature would require a non-trivial tweaking/rewrite of the logger, and it might be better to do that in a different PR.

Other minor improvements to be considered:

[x] In regularization tests, add dump() after record() and check that the historic data is logged in the CSV correctly.
[x] In preference comparsions, record train and val losses as well, not just the regularization parameter.

Sep 03 '22 16:09 Rocamonde

Added all the requested changes, except the tests for the .backward(), which I moved off to a new issue, #562 . This is because I read through the tests and I found this is a problem that we should address more generally and probably deserves a separate PR. We only really check for progress in 3 algorithms as far as I could find.

Sep 13 '22 11:09 Rocamonde

Dynamic L2 regularization for preference comparisons

Codecov Report