BERT-pytorch
BERT-pytorch copied to clipboard
Pretrained model transfer to pytorch
Well all of you guys know, it's nearly impossible to train from the scratch, because of lack of computation power. So I'm going to implement the transfer code for making pretrained model can be supported on pytorch too.
This implementation will be started when the Google release their official BERT code and pretrained model. If anyone interested to join this work, please leave the comment underside.
Thank you everyone who carefully watching this project👍 By Junseong Kim
This issue is stated from #3
I would like to join, even though I'm not sure how much I can do.
The training procedure of current implementation is smooth. I finished training on 10K pairs of sentences within 30 minutes, the final loss is 7.73.
Google has released the source and pre-trained models. https://github.com/google-research/bert
Although they claim that you need a TPU to train the base model. "Includes scripts to reproduce results. BERT-Base can be fine-tuned on a standard GPU; for BERT-Large, a Cloud TPU is required (as max batch size for 12-16 GB is too small)."
I believe fine tuning can be done on a multi GPU system with accumulating gradients in PyTorch.
I didn't get this done quickly enough apparently. Here is the pre-trained model in PyTorch that the HuggingFace team did. https://github.com/huggingface/pytorch-pretrained-BERT
Is the issue solved? Please tell me. I want to use your implementation together with the pretrained model to realise my ideas.
@ChawDoe At this point you should probably look at fast version of Bert from HuggingFace. https://medium.com/huggingface/distilbert-8cf3380435b5
@briandw Thank you.