Steindór Oddur Ellertsson

Results 5 comments of Steindór Oddur Ellertsson

Are you pre-training from scratch or initializing from the .h5 file? I've been pre-training with init from the h5 file and the loss appears to be unchanged between epochs ```...

I trained from scratch and no difference. I reduced the dataset size to only 10.000 sentences to make it easier to debug and perhaps make the model overfit the data...

Any updates on this?

When you say TPU v3 (128Gb), do you mean the TPU-v3 with 128 cores? When training with small batch sizes, are you seeing drop in performance compared to higher batch...

@peregilk Thank you for the response and the article. Interesting regarding the performance of LAMB vs adam, I will be training ALBERT with V3-256 TPU in the coming weeks, will...