textgenrnn
textgenrnn copied to clipboard
Tokenizing Dataset Fails with newline or index error
When trying to tokenize a dataset, it fails with either the error
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
or one about list index out of range.
Running the newest version of the Colab notebook and this happens with both GPT-2 and GPT-Neo.
Please let me know what info is needed or what I can try to fix this.
Thanks!