albert
albert copied to clipboard
pre-training ALBERT with fp16 and other optimizations
Hi, how would one get around to pre-train ALBERT with fp16 weights? Also is it possible to train albert on multiple GPUs? Also it would be great if anyone used transfer learning for teaching the English ALBERT a different language and would share their experience with me.
That experience would be very valuable. Also I think this paper is relevant in the topic: https://arxiv.org/abs/1910.11856