casaro comments

Repositories
Issues
Comments

Results 3 comments of


                                            casaro

The true names/sizes of the 4 GPT-2 models

Note that 117M and 124M are exactly the same checkpoint. They have around 124M parameters but it was called 117M initially. I guess they renamed it when they found this...

The true names/sizes of the 4 GPT-2 models

You can check the hparams.json file in Google storage. For all models it's the same: "n_vocab": 50257,

The fine tuned T5 model on clang8

I think you already started overfitting. Your loss on train is `0.159` but on valid `0.609`. Our run stopped with a loss of `0.356` Here is an exhaustive list of...