textgenrnn Tokenizing Dataset Fails with newline or index error

Tokenizing Dataset Fails with newline or index error

Open leetfin opened this issue 2 years ago • 0 comments

When trying to tokenize a dataset, it fails with either the error Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode? or one about list index out of range. Running the newest version of the Colab notebook and this happens with both GPT-2 and GPT-Neo.

Please let me know what info is needed or what I can try to fix this.

Thanks!

Nov 09 '21 18:11 leetfin

textgenrnn textgenrnn copied to clipboard

Tokenizing Dataset Fails with newline or index error

textgenrnn
textgenrnn copied to clipboard