electra [How to create vocab.txt file]

Can you explain a method to build vocab.txt file ?

May 19 '20 01:05 Vietdung113

It's the same procedure as you would create a vocab file for BERT. E.g. you could use the excellent Tokenizers library from Hugging Face for that.

For my Turkish BERT model I documented the vocab generation steps, see it here.

After that, you can use this vocab for training a new model :)

May 19 '20 07:05 stefan-it

thanks for your response. Can you give me format of your "tr_final" file? Or some sample like this?

May 21 '20 02:05 Vietdung113

I used a sentence-segmented training corpus. That means, each line of the input file contains one sentence :)

May 22 '20 11:05 stefan-it

I used a sentence-segmented training corpus. That means, each line of the input file contains one sentence :)

Is that one sentence a line thing ok? I am asking because there is an option called: blanks-separate-docs: Whether blank lines indicate document boundaries (True by default).

This sounds like you need blank lines to separate documents.

Jul 17 '20 20:07 PhilipMay