bert-vocab-builder icon indicating copy to clipboard operation
bert-vocab-builder copied to clipboard

BERT trained on custom corpus

Open anidiatm41 opened this issue 4 years ago • 1 comments

Hi M. H. Kwon, Your tokenization script is really helpful.

I trained a bert model with custom corpus using Google's Scripts like create_pretraining_data.py, run_pretraining.py ,extract_features.py etc..as a result I got vocab file, .tfrecord file, .jason file and check point files.

Now how to use those file for the below tasks:

  1. to predict a missing word in a given sentence?
  2. for next sentence prediction
  3. Q and A model

Need your help.

anidiatm41 avatar Oct 10 '20 05:10 anidiatm41

Hi, anidiatm41, Thank you.

For 3. Q and A model, Visit official bert github. There are instructions about how to do tasks like QA(SQuAD).

Predicting missing words and next sentence prediction are usually used for training. If you want to predict missing words for practical purpose, you need to make your own code. You can refer to evaluation part of run_pretraining.py. It's almost same.

kwonmha avatar Oct 13 '20 01:10 kwonmha