Training time

Open peregilk opened this issue 6 years ago • 1 comments

I am considering training AlBert from scratch in another language on a single TPU v3 128Gb. I have a corpus of around 2B words.

Would this be a sufficient corpus size? Could you give a rough estimate of how long this would take for the various models?

Nov 21 '19 14:11 peregilk

I want to do the same. Is it recommended to train on Cloud TPUs or stack of 2 or 4 local 2080 TI GPUs?

May 12 '20 15:05 AminTaheri23