OpenNMT-py icon indicating copy to clipboard operation
OpenNMT-py copied to clipboard

Support encoder/decoder freezing.

Open kyduff opened this issue 2 years ago • 1 comments

I propose supporting parameter freezing of the encoder and decoder. I've implemented the feature for use in my own experiments and would like to contribute the patch in response to interest demonstrated in #1857 and https://forum.opennmt.net/t/transformer-freezing-encoder-while-training-decoder/2723

The change works by

  • Conditionally setting requires_grad to False on encoder/decoder parameters as specified in the configuration options
  • Overwriting the freezing options from a loaded checkpoint (this is to support the most common use of parameter freezing, which is to freeze and transfer weights between training runs to be altered or fixed, e.g. to fine-tune, in another context)

Please leave any thoughts you have as comments. I am happy to refactor to fit the conventions of the repo, etc.

kyduff avatar Jul 26 '22 17:07 kyduff

My apologies for not running the PR check before submitting! I've run them locally and patched the problem in commit d8ed691 — hopefully this will go through correctly.

kyduff avatar Jul 29 '22 11:07 kyduff

Tested by @l-k-11235 and looks good. Merging. Thanks!

francoishernandez avatar Aug 30 '22 13:08 francoishernandez

@kyduff with further testing it seems we have some strange behaviors, notably with apex/fp16 not handling None values properly. And optimizer.load_state_dict sometimes raising errors about some mismatch in the parameter groups. Did you face any such issues in your setup? Did you have to pass specific parameters in your configuration (e.g. reset_optim: all)?

francoishernandez avatar Aug 31 '22 15:08 francoishernandez

@francoishernandez I use -reset_optim 'all' in my experiments so I did not encounter this issue—good catch. I've recreated the problem, which seems to occur when you train a frozen model from an unfrozen model & vice versa.

The problem is that parameters with requires_grad False aren't added to the optimizer parameter group, so the optimizer initialized from the frozen model has a shorter parameter list than the optimizer from the unfrozen one. I was able to patch the issue by changing https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/utils/optimizers.py#L35 from

params = [p for p in model.parameters() if p.requires_grad]

to

params = [p for p in model.parameters()]

I did a quick test to confirm that the required parameters retain the proper require_grad value (at least with adam)

alternatively we could add parameter groups for the different parts of the model—something like this might work

kyduff avatar Aug 31 '22 17:08 kyduff

also, I have not faced the apex/fp16 issue—not sure why that would happen

kyduff avatar Aug 31 '22 17:08 kyduff