LifeIsStrange

Results 246 comments of LifeIsStrange

this repository is abandonware, the de facto xlnet standard is https://github.com/huggingface/transformers/tree/master/src/transformers/models/xlnet

@vinhngx I find it deeply sad that the number one state of the art transformer model is in state of *abandonware*. How many downstream papers did not benefit from FP16...

"Maybe it can be used on top of RAdam" My intuition turned out to be right! There is now a ready made combination of both Optimizers: https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d This is the...

It is in the same line of thought as: https://github.com/zihangdai/xlnet/issues/216

Actually, Mish seems even more interesting to try first than Swish: *The experiments show that Mish tends to work better than both ReLU and Swish along with other standard activation...

@huseinzol05 this is a feature request, a potential for improving xlnet results which would be major. Issues are not necessarily bugs.. If admins disagree, they can close it, I will...

You should contribute to the XLnet implementation of the transformers library https://huggingface.co/transformers/model_doc/xlnet.html It is the defacto standard

Bert equivalent https://github.com/google-research/bert/pull/568

https://github.com/NVIDIA/Megatron-LM

@gertqin Hi, friendly ping, this feature is very important for our team, I was wondering whether you think progress will be made about it this year?