LifeIsStrange
LifeIsStrange
this repository is abandonware, the de facto xlnet standard is https://github.com/huggingface/transformers/tree/master/src/transformers/models/xlnet
@vinhngx I find it deeply sad that the number one state of the art transformer model is in state of *abandonware*. How many downstream papers did not benefit from FP16...
"Maybe it can be used on top of RAdam" My intuition turned out to be right! There is now a ready made combination of both Optimizers: https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d This is the...
It is in the same line of thought as: https://github.com/zihangdai/xlnet/issues/216
Actually, Mish seems even more interesting to try first than Swish: *The experiments show that Mish tends to work better than both ReLU and Swish along with other standard activation...
@huseinzol05 this is a feature request, a potential for improving xlnet results which would be major. Issues are not necessarily bugs.. If admins disagree, they can close it, I will...
You should contribute to the XLnet implementation of the transformers library https://huggingface.co/transformers/model_doc/xlnet.html It is the defacto standard
Bert equivalent https://github.com/google-research/bert/pull/568
https://github.com/NVIDIA/Megatron-LM
@gertqin Hi, friendly ping, this feature is very important for our team, I was wondering whether you think progress will be made about it this year?