gtfs
gtfs copied to clipboard
Fixes unicode support on gtfs
Hey, I was trying to use GTFS to parse some utf-8 data, and it was failing with weird UnicodeEncodeError. I traced this down to two factors:
- unmapped_entities.py was converting string attributes to str() (thus trying to convert all unicode to 'ascii').
- csv.reader doesn't handle unicode very well.
My first commit changes the test data to have one entry on Stops that has utf-8 characters, hence breaking the tests.
My second commit fixes both issues and makes the tests pass again: to fix 1, I've made a special case for str on umapped_entities to convert to unicode() instead of str(). to fix 2, I've created a unicode_csv_reader function that wraps around csv.reader/codes.iterdecode. The steps here are a bit annoying: iterdecode() from utf-8, encode it back, so csv.reader is fine with it, get the output from csv.reader and decode it back to utf-8, so we have the final utf-8 output.
thanks for attention, []s F.
Any chance this is fixed at some point?