stanford-tensorflow-tutorials icon indicating copy to clipboard operation
stanford-tensorflow-tutorials copied to clipboard

Lecture 11: 11_char_rnn ... 'charmap' codec can't decode byte 0x81 in position 170 ...

Open terminsen opened this issue 7 years ago • 2 comments

Hi -

I get the below traceback ... can you help with this one, please ?

Kind regards, Jesper.


UnicodeDecodeError Traceback (most recent call last) in () 148 149 if name == 'main': --> 150 main()

in main() 145 lm = CharRNN(model) 146 lm.create_model() --> 147 lm.train() 148 149 if name == 'main':

in train(self) 106 data = read_batch(stream, self.batch_size) 107 while True: --> 108 batch = next(data) 109 110 # for batch in read_batch(read_data(DATA_PATH, vocab)):

in read_batch(stream, batch_size) 38 def read_batch(stream, batch_size): 39 batch = [] ---> 40 for element in stream: 41 batch.append(element) 42 if len(batch) == batch_size:

in read_data(filename, vocab, window, overlap) 25 26 def read_data(filename, vocab, window, overlap): ---> 27 lines = [line.strip() for line in open(filename, 'r').readlines()] 28 while True: 29 random.shuffle(lines)

~\Anaconda3\lib\encodings\cp1252.py in decode(self, input, final) 21 class IncrementalDecoder(codecs.IncrementalDecoder): 22 def decode(self, input, final=False): ---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0] 24 25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 170: character maps to

terminsen avatar Jun 12 '18 13:06 terminsen

This is the file encoding issue Change line 27 to: lines = [line.strip() for line in open(filename, 'r', encoding="utf-8").readlines()]

goddice avatar Aug 21 '18 06:08 goddice

Thank you

terminsen avatar Aug 26 '18 13:08 terminsen