ALBERT-Pytorch icon indicating copy to clipboard operation
ALBERT-Pytorch copied to clipboard

what is the corpus format in train.txt for electra pre-training ??

Open marcusau opened this issue 5 years ago • 1 comments

if i wanna use my own textual data to pre-train a electra from scatch, what is the format of the text?

Only sentence segmentation or even more ??

Please help.

marcusau avatar May 11 '20 15:05 marcusau

@marcusau , hey were you able to figure out how to pretrain on a completely new corpus ? i am trying to pretrain on a new language but dont understand how to produce the train.tokens and vocab.txt files

StephennFernandes avatar Apr 12 '22 06:04 StephennFernandes