casaro

Results 3 comments of casaro

Note that 117M and 124M are exactly the same checkpoint. They have around 124M parameters but it was called 117M initially. I guess they renamed it when they found this...

You can check the hparams.json file in Google storage. For all models it's the same: "n_vocab": 50257,

I think you already started overfitting. Your loss on train is `0.159` but on valid `0.609`. Our run stopped with a loss of `0.356` Here is an exhaustive list of...