xlnet icon indicating copy to clipboard operation
xlnet copied to clipboard

no lr_layer_decay_rate for embedding

Open fyubang opened this issue 5 years ago • 3 comments

Thanks for your work. I found that there is no lr_layer_decay_rate for embedding layer, which is weird because embedding is actually below transformer layers.

fyubang avatar Jul 23 '19 02:07 fyubang

Here is a PR, FYI. https://github.com/zihangdai/xlnet/pull/93

ymcui avatar Jul 23 '19 05:07 ymcui

Does this mean the learning rate decay was only applied to the 24 transformer layers? Not to the embedding layers or the dense layers for start and end logits? I'm trying to reproduce the paper results in PyTorch. @ymcui @zihangdai

hlums avatar Sep 25 '19 18:09 hlums

@hlums see: https://github.com/zihangdai/xlnet/blob/master/model_utils.py#L149 As the code reveals, currently, lr_layer_decay is only applied to transformer layers, but not to the other parts (embedding, etc.).

ymcui avatar Sep 26 '19 02:09 ymcui