Agile_Data_Code icon indicating copy to clipboard operation
Agile_Data_Code copied to clipboard

UnicodeDecodeError running test_avro.py on Windows

Open JBPressac opened this issue 11 years ago • 3 comments

Hello, I have a UnicodeDecodeError running test_avro.py on WIndows 7. Here is a copy of my Windows terminal, thank you for you help:

C:\Users\me\Documents\Agile_Data>python test_avro.py
Traceback (most recent call last):
  File "test_avro.py", line 50, in <module>
    for record in df_reader:
  File "c:\Python27\lib\site-packages\avro\datafile.py", line 362, in next
    datum = self.datum_reader.read(self.datum_decoder)
  File "c:\Python27\lib\site-packages\avro\io.py", line 445, in read
    return self.read_data(self.writers_schema, self.readers_schema, decoder)
  File "c:\Python27\lib\site-packages\avro\io.py", line 490, in read_data
    return self.read_record(writers_schema, readers_schema, decoder)
  File "c:\Python27\lib\site-packages\avro\io.py", line 690, in read_record
    field_val = self.read_data(field.type, readers_field.type, decoder)
  File "c:\Python27\lib\site-packages\avro\io.py", line 468, in read_data
    return decoder.read_utf8()
  File "c:\Python27\lib\site-packages\avro\io.py", line 233, in read_utf8
    return unicode(self.read_bytes(), "utf-8")
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c in position 0: invalid start byte

Jean-Baptiste

JBPressac avatar Jan 10 '14 19:01 JBPressac

Working on reproducing, thanks

rjurney avatar Jan 12 '14 00:01 rjurney

The problem is with the white spaces in the topic: string. Avro is trying to read the string and the white spaces are causing issues due to decoding. So if you change strings to remove spaces, it works with no errors.

df_writer.append( {"message_id": 11, "topic": "Hellogalaxy", "user_id": 1} )
df_writer.append( {"message_id": 12, "topic": "Jimissilly!", "user_id": 1} )
df_writer.append( {"message_id": 23, "topic": "Ilikeapples.", "user_id": 2} )
df_writer.close()

This does not bode well for the next chapters, so I am trying to looking into avro's io.py file to see if I can change something. Or see if I can do some encoding in the test_avro.py file.

gh4yarli avatar May 06 '14 15:05 gh4yarli

Try 'rb' option when open file.

# Create a 'data file' (avro file) reader
df_reader = datafile.DataFileReader(
  open(OUTFILE_NAME, 'rb'),
  rec_reader
)

jiujiu18 avatar Jan 11 '16 11:01 jiujiu18