udify
udify copied to clipboard
Training with XLM-RoBERTa
Hi, has anybody looked into training a version of udify with XLM-RoBERTa? Seems like it could help with the low-resource languages in multilingual BERT so I'm planning on giving it a go if nobody else has already.
That's a good idea. Now that I see that Hugggingface has added support for it, it should be straightforward to add support here. I might get around to it, but feel free to try it yourself.
Training on my single GPU might take a while. 🙂
I've got a couple spare 2080ti that should do the trick. I've never used AllenNLP before so I'm a little unfamiliar with how all these config files work. If you could give me some general guidance on what I would have to update in the code I'm happy to take a crack at it and share my results.
The first thing to do would be to add the latest transformers release to requirements.txt which has XLM RoBERTa here and here. Then it should be imported into udify/modules/bert_pretrained.py and replace BertTokenizer/BertModel/BertConfig wherever necessary. Finally, copy config/ud/multilingual/udify_bert_finetune_multilingual.json and modify it to point to xlm-roberta-base instead of bert-base-multilingual-cased (and a new vocab.txt file which can be extracted from the pretrained model archive).
There might be a few details I missed, but I think that's most of it. I also highly recommend using a debugger inside udify/modules/bert_pretrained.py to see what the data looks like.
Thanks for offering your help!
Thanks, I'll take a crack at it
Followed the steps you outlined and modified a few other things as well (i.e. modifying the special tokens and tokenizer) but I keep running into allennlp errors that I can't quite sort. I have plenty compute available if anybody else manages to get this running but I don't think I'll be able to.
Update: came back to this and figured it out. Just had to deal with the differences between how pytorch_pretrained_bert and transformers dealt with model outputs. Training now.
How's the training going? Any problems?
I had a few issues with gradient explosion. Had to take a couple days off but I'm getting back at it now to see if I can get it going again.
Hey, I am trying to train a version of udify with a bert-like model, I was wondering if you got any updates for the changes needed ? Thank you in advance @ssdorsey @Hyperparticle
@ssdorsey or anybody else,
were you able to train the model? Looking to do the same with XLM-R but if you have any experience that you can share, it would be really helpful. TIA.
Any of you might have some updates regarding the training?
@ssdorsey can you once tell us how exactly did you handle the outputs finally? I'm getting this error even after changing the config files.
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling ``cublasCreate(handle)\``
I'm trying to use "ai4bharat/indic-bert" as pre-trained model. The procedure will be very similar to what you would have done in XLM-RoBERTa.