bert
bert copied to clipboard
TensorFlow code and pre-trained models for BERT
I'm trying to create pretraining data using GPU server. However, while I was running `create_pre_training_data.py`, I found that there is no running process on GPU. How can I run the...
During the generation of the key matrix, query matrix, and value matrix, what are the weights used? How do I print these weights?
For anyone who wants to tune the pre-trained model offered by Google on your own domain-specific corpus for several additional epochs using Google TPU, here is a tutorial I just...
Hello! Would it be possible to release the collaterals (scripts, hyper parameters, etc.) to reproduce the pretraining distillation (PD) results you've presented in the Well-Read Students Learn Better paper (the...
Hi, I've been interested in quantization, pruning and distillation so far. I'm really appreciated you to provide 24 smaller model and If possible, I'd like you to let me know...
MLM Acc 0.558 and NSP Acc 0.987 -- I think this seems like overfitting. In fact in the experiment I found that MLM Acc is very hard to improve and...
Why we need to mutilply output(None, seqlen, 768) to embedding_table(21128, 768) in last step during doing mlm task ? Is there some explanations in mathematics?
Is there any pre-trained model of Bert or a similar tool which has been trained on the widest body of knowledge possible to be an effective general purpose question answering...
There are "no attribute" errors such as `AttributeError: module 'tensorflow' has no attribute 'gfile'` `AttributeError: module 'tensorflow' has no attribute 'flags'` because tensorflow V2 is automatically installed from `pip install...
It looks like that the length of "attention_heads" is always 1 in the function of "transformer_model". But the code in "modeling.py" has an "if-else" statement. Can we remove the "attention_heads"...