deepmind-research
deepmind-research copied to clipboard
Is the adaptive gradient clipping also applicable to layer normalization?
Hi, My question is as the title. Thanks