OLMo icon indicating copy to clipboard operation
OLMo copied to clipboard

Update clipping

Open dirkgr opened this issue 2 years ago • 0 comments

The theory is that the second moment goes to zero, resulting in a big update, which results in a loss spike.

  • [x] Generate some checkpoints closer to the spike
  • [x] Implement extra logging so we can make sure that this is actually what happens
  • [ ] Implement update clipping with a maximum per-parameter update norm of 1. Same as Adafactor: https://github.com/google-research/t5x/blob/03dfc44be7f9a93d34c1d7fd6f896d1c364a7d4d/t5x/adafactor.py#L470C1-L476C26
  • [ ] Ablate it

dirkgr avatar Aug 18 '23 21:08 dirkgr