bert-vocab-builder BERT trained on custom corpus

BERT trained on custom corpus

Open anidiatm41 opened this issue 4 years ago • 1 comments

Hi M. H. Kwon, Your tokenization script is really helpful.

I trained a bert model with custom corpus using Google's Scripts like create_pretraining_data.py, run_pretraining.py ,extract_features.py etc..as a result I got vocab file, .tfrecord file, .jason file and check point files.

Now how to use those file for the below tasks:

to predict a missing word in a given sentence?
for next sentence prediction
Q and A model

Need your help.

Oct 10 '20 05:10 anidiatm41

Hi, anidiatm41, Thank you.

For 3. Q and A model, Visit official bert github. There are instructions about how to do tasks like QA(SQuAD).

Predicting missing words and next sentence prediction are usually used for training. If you want to predict missing words for practical purpose, you need to make your own code. You can refer to evaluation part of run_pretraining.py. It's almost same.

Oct 13 '20 01:10 kwonmha

bert-vocab-builder bert-vocab-builder copied to clipboard

BERT trained on custom corpus

bert-vocab-builder
bert-vocab-builder copied to clipboard