textgenrnn icon indicating copy to clipboard operation
textgenrnn copied to clipboard

Tokenizing Dataset Fails with newline or index error

Open leetfin opened this issue 2 years ago • 0 comments

When trying to tokenize a dataset, it fails with either the error Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode? or one about list index out of range. Running the newest version of the Colab notebook and this happens with both GPT-2 and GPT-Neo.

Please let me know what info is needed or what I can try to fix this.

Thanks!

leetfin avatar Nov 09 '21 18:11 leetfin