Agile_Data_Code
Agile_Data_Code copied to clipboard
UnicodeDecodeError running test_avro.py on Windows
Hello, I have a UnicodeDecodeError running test_avro.py on WIndows 7. Here is a copy of my Windows terminal, thank you for you help:
C:\Users\me\Documents\Agile_Data>python test_avro.py
Traceback (most recent call last):
File "test_avro.py", line 50, in <module>
for record in df_reader:
File "c:\Python27\lib\site-packages\avro\datafile.py", line 362, in next
datum = self.datum_reader.read(self.datum_decoder)
File "c:\Python27\lib\site-packages\avro\io.py", line 445, in read
return self.read_data(self.writers_schema, self.readers_schema, decoder)
File "c:\Python27\lib\site-packages\avro\io.py", line 490, in read_data
return self.read_record(writers_schema, readers_schema, decoder)
File "c:\Python27\lib\site-packages\avro\io.py", line 690, in read_record
field_val = self.read_data(field.type, readers_field.type, decoder)
File "c:\Python27\lib\site-packages\avro\io.py", line 468, in read_data
return decoder.read_utf8()
File "c:\Python27\lib\site-packages\avro\io.py", line 233, in read_utf8
return unicode(self.read_bytes(), "utf-8")
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c in position 0: invalid start byte
Jean-Baptiste
Working on reproducing, thanks
The problem is with the white spaces in the topic: string. Avro is trying to read the string and the white spaces are causing issues due to decoding. So if you change strings to remove spaces, it works with no errors.
df_writer.append( {"message_id": 11, "topic": "Hellogalaxy", "user_id": 1} )
df_writer.append( {"message_id": 12, "topic": "Jimissilly!", "user_id": 1} )
df_writer.append( {"message_id": 23, "topic": "Ilikeapples.", "user_id": 2} )
df_writer.close()
This does not bode well for the next chapters, so I am trying to looking into avro's io.py file to see if I can change something. Or see if I can do some encoding in the test_avro.py file.
Try 'rb' option when open file.
# Create a 'data file' (avro file) reader
df_reader = datafile.DataFileReader(
open(OUTFILE_NAME, 'rb'),
rec_reader
)