optax icon indicating copy to clipboard operation
optax copied to clipboard

Adafactor: Update Clipping vs Gradient Clipping

Open dlwh opened this issue 2 years ago • 1 comments

The Adafactor paper (section 6) suggests "update clipping" instead of the usual "gradient clipping".

image

"Update clipping" here means clipping the update after all the fancy moment stuff has been applied (line 9 above)

However, the alias defined in alias.py clips the initial gradients:

https://github.com/deepmind/optax/blob/master/optax/_src/alias.py#L132-L140

Is this deliberate? FWIW, t5x's adafactor implements update clipping: https://github.com/google-research/t5x/blob/03dfc44be7f9a93d34c1d7fd6f896d1c364a7d4d/t5x/adafactor.py#L470-L476

dlwh avatar Aug 30 '22 18:08 dlwh

Thanks for the question!

@mtthss could you comment on this, since you're the one who ported Adafactor from Flax?

hbq1 avatar Oct 20 '22 11:10 hbq1