addons
                                
                                 addons copied to clipboard
                                
                                    addons copied to clipboard
                            
                            
                            
                        very strange result with lamb optimizers
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): linux 20.04
- TensorFlow version and how it was installed (source or binary): 2.8 pip install
- TensorFlow-Addons version and how it was installed (source or binary): 1.6.1 pip install
- Python version: 3.8.10
- Is GPU used? (yes/no):yes
Describe the bug
I ran vit transformer implementations of google, and I tried to optimize it, with lamb and adam. and I get very strange results with lamb.
It seems this is a bug.
 LAMB in orange, and ADAM in purple
see? the lamb on the first epoch set the distributions on 2 hills, while adam did not do such thing.
Another shape that appears  a lot is this:
LAMB in orange, and ADAM in purple
see? the lamb on the first epoch set the distributions on 2 hills, while adam did not do such thing.
Another shape that appears  a lot is this:
 it seems the algorithm re-init the weight with some kind of Xavier init and keep it for some reason
it seems the algorithm re-init the weight with some kind of Xavier init and keep it for some reason
Code to reproduce the issue Optimize vit transformer of tensorflow/models.
Other info / logs tensorboard logs are attached: tensorboard.zip
actually it's quite surprising, though the first step might be buggy-some, it seems that the optimizer "understands" it and try to shift the weights:
 
 
 The "twins" are destroyed.
Also the "bump" seems to be soften :
The "twins" are destroyed.
Also the "bump" seems to be soften :
 It seems a bit strange, as adam was able on the first steps to get those shapes.
It seems a bit strange, as adam was able on the first steps to get those shapes.
@junjiek Are you still active as the LAMB codeowner?