BERT-pytorch what’s your data set?

what’s your data set?

Open FFMRyan opened this issue 6 years ago • 7 comments

Oct 15 '18 14:10 FFMRyan

@iOSGeekerOfChina I didn't decide yet, just started this project one hour ago haha. Do you think using the dataset which referred on paper is good idea? Or have some another good idea? thanx 👍

Oct 15 '18 14:10 codertimo

Maybe you can try some multilingual corpus not just English, hah

Oct 18 '18 02:10 crazyofapple

@crazyofapple Totally agree haha. Now I'm trying to train this model with korean corpus with 1080ti x2. But seriously, the model is too big for individual researcher.... we need some NASA Scale GPU power.

Oct 18 '18 02:10 codertimo

Just the same dataset with raw paper,I think maybe better

Oct 18 '18 03:10 MrRace

@MrRace Love to do it, if I have enough lot's of 2080ti. https://twitter.com/Tim_Dettmers/status/1050787783004942336

Regarding compute for BERT: Uses 256 TPU-hours similar to the OpenAI model. Lots of TPUs parallelize about 25% better than GPUs. RTX 2080 Ti and V100 should be ~70% matmul and ~90% matmul perf vs TPU if you use 16-bit (important!). BERT ~= 375 RTX 2080 Ti days or 275 V100 days.

Oct 18 '18 14:10 codertimo

@codertimo http://timdettmers.com/2018/10/17/tpus-vs-gpus-for-transformers-bert/

Oct 19 '18 01:10 MrRace

@MrRace

On a standard, affordable GPU machine with 4 GPUs one can expect to train BERT for about 99 days using 16-bit or about 21 days using 8-bit.

Haha 99 days LoL.

Oct 19 '18 01:10 codertimo

BERT-pytorch BERT-pytorch copied to clipboard

what’s your data set?

BERT-pytorch
BERT-pytorch copied to clipboard