optax
optax copied to clipboard
Default epsilon values for AdaBelief don't match the paper
In the AdaBelief paper, there is only one epsilon = 1e-8 that is used both to damp the second moment estimate and as constant in the denominator. In Optax, there are instead eps = 1e-16 and root_eps = 1e-16. Initially, I just set eps = 1e-8 in the hope to match the paper, but just no noticed that I also need to set root_eps = 1e-1. A few ideas how this might be improved:
- Add a note in the documentation
- Use the defaults
eps = 1e-8androot_eps = Noneand the setif root_eps is None: root_eps = eps - At least default
eps = 1e-8androot_eps = 1e-8Is there a particular reason the implementation uses different default hparams?