ALBERT-Pytorch
ALBERT-Pytorch copied to clipboard
what is the corpus format in train.txt for electra pre-training ??
if i wanna use my own textual data to pre-train a electra from scatch, what is the format of the text?
Only sentence segmentation or even more ??
Please help.
@marcusau , hey were you able to figure out how to pretrain on a completely new corpus ? i am trying to pretrain on a new language but dont understand how to produce the train.tokens and vocab.txt files