BERT-pytorch
BERT-pytorch copied to clipboard
what’s your data set?
@iOSGeekerOfChina I didn't decide yet, just started this project one hour ago haha. Do you think using the dataset which referred on paper is good idea? Or have some another good idea? thanx 👍
Maybe you can try some multilingual corpus not just English, hah
@crazyofapple Totally agree haha. Now I'm trying to train this model with korean corpus with 1080ti x2. But seriously, the model is too big for individual researcher.... we need some NASA Scale GPU power.
Just the same dataset with raw paper,I think maybe better
@MrRace Love to do it, if I have enough lot's of 2080ti. https://twitter.com/Tim_Dettmers/status/1050787783004942336
Regarding compute for BERT: Uses 256 TPU-hours similar to the OpenAI model. Lots of TPUs parallelize about 25% better than GPUs. RTX 2080 Ti and V100 should be ~70% matmul and ~90% matmul perf vs TPU if you use 16-bit (important!). BERT ~= 375 RTX 2080 Ti days or 275 V100 days.
@codertimo http://timdettmers.com/2018/10/17/tpus-vs-gpus-for-transformers-bert/
@MrRace
On a standard, affordable GPU machine with 4 GPUs one can expect to train BERT for about 99 days using 16-bit or about 21 days using 8-bit.
Haha 99 days LoL.