gtfs icon indicating copy to clipboard operation
gtfs copied to clipboard

Fixes unicode support on gtfs

Open fserb opened this issue 14 years ago • 1 comments

Hey, I was trying to use GTFS to parse some utf-8 data, and it was failing with weird UnicodeEncodeError. I traced this down to two factors:

  1. unmapped_entities.py was converting string attributes to str() (thus trying to convert all unicode to 'ascii').
  2. csv.reader doesn't handle unicode very well.

My first commit changes the test data to have one entry on Stops that has utf-8 characters, hence breaking the tests.

My second commit fixes both issues and makes the tests pass again: to fix 1, I've made a special case for str on umapped_entities to convert to unicode() instead of str(). to fix 2, I've created a unicode_csv_reader function that wraps around csv.reader/codes.iterdecode. The steps here are a bit annoying: iterdecode() from utf-8, encode it back, so csv.reader is fine with it, get the output from csv.reader and decode it back to utf-8, so we have the final utf-8 output.

thanks for attention, []s F.

fserb avatar Dec 07 '10 01:12 fserb

Any chance this is fixed at some point?

Lawouach avatar Dec 27 '13 17:12 Lawouach