udify icon indicating copy to clipboard operation
udify copied to clipboard

Training with XLM-RoBERTa

Open ssdorsey opened this issue 5 years ago • 12 comments

Hi, has anybody looked into training a version of udify with XLM-RoBERTa? Seems like it could help with the low-resource languages in multilingual BERT so I'm planning on giving it a go if nobody else has already.

ssdorsey avatar Feb 25 '20 07:02 ssdorsey

That's a good idea. Now that I see that Hugggingface has added support for it, it should be straightforward to add support here. I might get around to it, but feel free to try it yourself.

Training on my single GPU might take a while. 🙂

Hyperparticle avatar Feb 25 '20 18:02 Hyperparticle

I've got a couple spare 2080ti that should do the trick. I've never used AllenNLP before so I'm a little unfamiliar with how all these config files work. If you could give me some general guidance on what I would have to update in the code I'm happy to take a crack at it and share my results.

ssdorsey avatar Feb 25 '20 19:02 ssdorsey

The first thing to do would be to add the latest transformers release to requirements.txt which has XLM RoBERTa here and here. Then it should be imported into udify/modules/bert_pretrained.py and replace BertTokenizer/BertModel/BertConfig wherever necessary. Finally, copy config/ud/multilingual/udify_bert_finetune_multilingual.json and modify it to point to xlm-roberta-base instead of bert-base-multilingual-cased (and a new vocab.txt file which can be extracted from the pretrained model archive).

There might be a few details I missed, but I think that's most of it. I also highly recommend using a debugger inside udify/modules/bert_pretrained.py to see what the data looks like.

Thanks for offering your help!

Hyperparticle avatar Feb 26 '20 01:02 Hyperparticle

Thanks, I'll take a crack at it

ssdorsey avatar Feb 26 '20 18:02 ssdorsey

Followed the steps you outlined and modified a few other things as well (i.e. modifying the special tokens and tokenizer) but I keep running into allennlp errors that I can't quite sort. I have plenty compute available if anybody else manages to get this running but I don't think I'll be able to.

ssdorsey avatar Mar 02 '20 17:03 ssdorsey

Update: came back to this and figured it out. Just had to deal with the differences between how pytorch_pretrained_bert and transformers dealt with model outputs. Training now.

ssdorsey avatar Mar 12 '20 03:03 ssdorsey

How's the training going? Any problems?

Hyperparticle avatar Mar 17 '20 04:03 Hyperparticle

I had a few issues with gradient explosion. Had to take a couple days off but I'm getting back at it now to see if I can get it going again.

ssdorsey avatar Mar 19 '20 17:03 ssdorsey

Hey, I am trying to train a version of udify with a bert-like model, I was wondering if you got any updates for the changes needed ? Thank you in advance @ssdorsey @Hyperparticle

ArijRB avatar Mar 17 '21 16:03 ArijRB

@ssdorsey or anybody else,

were you able to train the model? Looking to do the same with XLM-R but if you have any experience that you can share, it would be really helpful. TIA.

prashantkodali avatar Sep 13 '21 11:09 prashantkodali

Any of you might have some updates regarding the training?

shaked571 avatar Apr 05 '22 13:04 shaked571

@ssdorsey can you once tell us how exactly did you handle the outputs finally? I'm getting this error even after changing the config files. RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling ``cublasCreate(handle)\``

I'm trying to use "ai4bharat/indic-bert" as pre-trained model. The procedure will be very similar to what you would have done in XLM-RoBERTa.

guptabhinav49 avatar May 10 '22 19:05 guptabhinav49