albert
albert copied to clipboard
Training time
I am considering training AlBert from scratch in another language on a single TPU v3 128Gb. I have a corpus of around 2B words.
Would this be a sufficient corpus size? Could you give a rough estimate of how long this would take for the various models?
I want to do the same. Is it recommended to train on Cloud TPUs or stack of 2 or 4 local 2080 TI GPUs?