BERT-pytorch icon indicating copy to clipboard operation
BERT-pytorch copied to clipboard

what’s your data set?

Open FFMRyan opened this issue 6 years ago • 7 comments

FFMRyan avatar Oct 15 '18 14:10 FFMRyan

@iOSGeekerOfChina I didn't decide yet, just started this project one hour ago haha. Do you think using the dataset which referred on paper is good idea? Or have some another good idea? thanx 👍

codertimo avatar Oct 15 '18 14:10 codertimo

Maybe you can try some multilingual corpus not just English, hah

crazyofapple avatar Oct 18 '18 02:10 crazyofapple

@crazyofapple Totally agree haha. Now I'm trying to train this model with korean corpus with 1080ti x2. But seriously, the model is too big for individual researcher.... we need some NASA Scale GPU power.

codertimo avatar Oct 18 '18 02:10 codertimo

Just the same dataset with raw paper,I think maybe better

MrRace avatar Oct 18 '18 03:10 MrRace

@MrRace Love to do it, if I have enough lot's of 2080ti. https://twitter.com/Tim_Dettmers/status/1050787783004942336

Regarding compute for BERT: Uses 256 TPU-hours similar to the OpenAI model. Lots of TPUs parallelize about 25% better than GPUs. RTX 2080 Ti and V100 should be ~70% matmul and ~90% matmul perf vs TPU if you use 16-bit (important!). BERT ~= 375 RTX 2080 Ti days or 275 V100 days.

codertimo avatar Oct 18 '18 14:10 codertimo

@codertimo http://timdettmers.com/2018/10/17/tpus-vs-gpus-for-transformers-bert/

MrRace avatar Oct 19 '18 01:10 MrRace

@MrRace

On a standard, affordable GPU machine with 4 GPUs one can expect to train BERT for about 99 days using 16-bit or about 21 days using 8-bit.

Haha 99 days LoL.

codertimo avatar Oct 19 '18 01:10 codertimo