xlnet no lr_layer_decay_rate for embedding

no lr_layer_decay_rate for embedding

Open fyubang opened this issue 5 years ago • 3 comments

Thanks for your work. I found that there is no lr_layer_decay_rate for embedding layer, which is weird because embedding is actually below transformer layers.

Jul 23 '19 02:07 fyubang

Here is a PR, FYI. https://github.com/zihangdai/xlnet/pull/93

Jul 23 '19 05:07 ymcui

Does this mean the learning rate decay was only applied to the 24 transformer layers? Not to the embedding layers or the dense layers for start and end logits? I'm trying to reproduce the paper results in PyTorch. @ymcui @zihangdai

Sep 25 '19 18:09 hlums

@hlums see: https://github.com/zihangdai/xlnet/blob/master/model_utils.py#L149 As the code reveals, currently, lr_layer_decay is only applied to transformer layers, but not to the other parts (embedding, etc.).

Sep 26 '19 02:09 ymcui

xlnet xlnet copied to clipboard

no lr_layer_decay_rate for embedding

xlnet
xlnet copied to clipboard