training-charRNN icon indicating copy to clipboard operation
training-charRNN copied to clipboard

UnicodeDecodeError

Open iamsarthakk opened this issue 5 years ago • 1 comments

Getting the following message while training:

File "train.py", line 179, in main() File "train.py", line 76, in main train(args) File "train.py", line 91, in train data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length) File "/spell/training-lstm/utils.py", line 21, in init self.preprocess(input_file, vocab_file, tensor_file) File "/spell/training-lstm/utils.py", line 30, in preprocess data = f.read() File "/usr/lib/python3.5/codecs.py", line 698, in read return self.reader.read(size) File "/usr/lib/python3.5/codecs.py", line 501, in read newchars, decodedbytes = self.decode(data, self.errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 7047: invalid start byte

iamsarthakk avatar Oct 06 '18 18:10 iamsarthakk

Seems to be an unicode issue with your text. Try replacing line 91 in train.py:

data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length)

with

data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length, "ISO-8859-1")

cvalenzuela avatar Oct 07 '18 02:10 cvalenzuela