Brian Williams comments

Results 12 comments of


                                            Brian Williams

trafficstars

pred_loss decrease fast while avg_acc stay at 50%

> Hi there, > I trained the model on a big dataset (wiki 2500M + bookscorpus 800M, same as the BERT paper) for 200000 steps and achieve an accuracy of...

Pretrained model transfer to pytorch

Google has released the source and pre-trained models. https://github.com/google-research/bert Although they claim that you need a TPU to train the base model. "Includes scripts to reproduce results. BERT-Base can be...

Pretrained model transfer to pytorch

I didn't get this done quickly enough apparently. Here is the pre-trained model in PyTorch that the HuggingFace team did. https://github.com/huggingface/pytorch-pretrained-BERT

Pretrained model transfer to pytorch

@ChawDoe At this point you should probably look at fast version of Bert from HuggingFace. https://medium.com/huggingface/distilbert-8cf3380435b5

Is it possible to train BERT?

The authors plan on releasing the full pre-trained model in a few weeks. There will be the task of loading their model weights into PyTorch. Perhaps ONNX will work for...

Is it possible to train BERT?

I can try to import the Tensor2tensor model into PT. https://github.com/tensorflow/tensor2tensor It should be the same process.

Is it possible to train BERT?

@codertimo Should the goal be to train BERT from scratch or to fine-tune the model? I'd say that scratch training isn't realistic right now. Fine-tuneing shouldn't be that resource intense...

Tie the input and output embedding?

Here's a paper: https://arxiv.org/abs/1608.05859 With tying there is a lower memory requirement and the training should be faster (i believe).

Finetune with H100 and CUDA 11.8

I'm seeing the same error, but I believe it's related to this issue in bitsandbytes https://github.com/TimDettmers/bitsandbytes/issues/383 and https://github.com/TimDettmers/bitsandbytes/issues/599

Add ability to drag tags

You can drag the tags but it crashes if I do that every time.