Brian Williams
Brian Williams
> Hi there, > I trained the model on a big dataset (wiki 2500M + bookscorpus 800M, same as the BERT paper) for 200000 steps and achieve an accuracy of...
Google has released the source and pre-trained models. https://github.com/google-research/bert Although they claim that you need a TPU to train the base model. "Includes scripts to reproduce results. BERT-Base can be...
I didn't get this done quickly enough apparently. Here is the pre-trained model in PyTorch that the HuggingFace team did. https://github.com/huggingface/pytorch-pretrained-BERT
@ChawDoe At this point you should probably look at fast version of Bert from HuggingFace. https://medium.com/huggingface/distilbert-8cf3380435b5
The authors plan on releasing the full pre-trained model in a few weeks. There will be the task of loading their model weights into PyTorch. Perhaps ONNX will work for...
I can try to import the Tensor2tensor model into PT. https://github.com/tensorflow/tensor2tensor It should be the same process.
@codertimo Should the goal be to train BERT from scratch or to fine-tune the model? I'd say that scratch training isn't realistic right now. Fine-tuneing shouldn't be that resource intense...
Here's a paper: https://arxiv.org/abs/1608.05859 With tying there is a lower memory requirement and the training should be faster (i believe).
I'm seeing the same error, but I believe it's related to this issue in bitsandbytes https://github.com/TimDettmers/bitsandbytes/issues/383 and https://github.com/TimDettmers/bitsandbytes/issues/599
You can drag the tags but it crashes if I do that every time.