Steindór Oddur Ellertsson comments

Repositories
Issues
Comments

Results 5 comments of


                                            Steindór Oddur Ellertsson

Pre-training using GPUs is strange

Are you pre-training from scratch or initializing from the .h5 file? I've been pre-training with init from the h5 file and the loss appears to be unchanged between epochs ```...

Pre-training using GPUs is strange

I trained from scratch and no difference. I reduced the dataset size to only 10.000 sentences to make it easier to debug and perhaps make the model overfit the data...

Multilingual Albert

Any updates on this?

Training from scratch on TPU

When you say TPU v3 (128Gb), do you mean the TPU-v3 with 128 cores? When training with small batch sizes, are you seeing drop in performance compared to higher batch...

Training from scratch on TPU

@peregilk Thank you for the response and the article. Interesting regarding the performance of LAMB vs adam, I will be training ALBERT with V3-256 TPU in the coming weeks, will...