gtfspy icon indicating copy to clipboard operation
gtfspy copied to clipboard

Issue reading in Zip-file

Open BeishuizenTimKPMG opened this issue 5 years ago โ€ข 5 comments

Not all zip-files can be used to create the sqlite file.

The Dutch public transport net cannot be processed by zip, only after unzipping. (data from www.openOV.nl)

Not a major issue, but this bug creates a minor annoyance for the users. Maybe a good case study for bug testing?

BeishuizenTimKPMG avatar Mar 04 '19 14:03 BeishuizenTimKPMG

Hi @BeishuizenTimKPMG,

Can you provide a sample of your code with the error message so that we can understand what the problem is about in more detail.

rmkujala avatar Mar 05 '19 08:03 rmkujala

The code is directly taken from the example in " gtfspy/examples/example_temporal_distance_profile.py". The error can be found in the following method:

import_gtfs.import_gtfs([imported_data_path],processed_data_path)

In this line, using a direct link to the previously mentioned zip file at www.openOV.nl it can not be loaded.

The following is printed:

Beginning AgencyLoader Importing agency.txt into agencies for Indexing agencies Post-import agency.txt into agencies Beginning RouteLoader Importing routes.txt into routes for Indexing routes Beginning MetadataLoader Indexing metadata Beginning CalendarLoader calendar.txt missing in {'zipfile': '../data/raw/gtfs-nl.zip', 'zip_commonprefix': ''} Indexing calendar Beginning CalendarDatesLoader Importing calendar_dates.txt into calendar_dates for Beginning ShapeLoader Importing shapes.txt into shapes for Indexing shapes Post-import shapes.txt into shapes Beginning FeedInfoLoader Importing feed_info.txt into feed_info for Beginning StopLoader Importing stops.txt into stops for Indexing stops Post-import stops.txt into stops Beginning TransfersLoader Not importing transfers.txt into transfers for Beginning StopDistancesLoader Post-import None into stop_distances Calculating straight-line transfer distances Copying information from transfers to stop_distances. Beginning TripLoader Importing trips.txt into trips for Indexing trips Beginning StopTimesLoader Importing stop_times.txt into stop_times for

And the following error occurs:


AttributeError Traceback (most recent call last) in 1 # Not needed to rerun, is for accessing data ----> 2 import_gtfs.import_gtfs([imported_data_path],processed_data_path)

~/Projects/GemeenteAmsterdam/gtfspy-master/gtfspy/import_gtfs.py in import_gtfs(gtfs_sources, output, preserve_connection, print_progress, location_name, **kwargs) 102 103 for loader in loaders: --> 104 loader.import_(conn) 105 106 # Do any operations that require all tables present.

~/Projects/GemeenteAmsterdam/gtfspy-master/gtfspy/import_loaders/table_loader.py in import_(self, conn) 355 # This does insertions 356 if self.mode in ('all', 'import') and self.fname and self.exists() and self.table not in ignore_tables: --> 357 self.insert_data(conn) 358 # This makes indexes in the DB. 359 if self.mode in ('all', 'index') and hasattr(self, 'index'):

~/Projects/GemeenteAmsterdam/gtfspy-master/gtfspy/import_loaders/table_loader.py in insert_data(self, conn) 295 from itertools import chain 296 rows = chain([row], self.gen_rows([csv_reader], [prefix])) --> 297 cur.executemany(stmt, rows) 298 conn.commit() 299

~/Projects/GemeenteAmsterdam/gtfspy-master/gtfspy/import_loaders/stop_times_loader.py in gen_rows(self, readers, prefixes) 23 def gen_rows(self, readers, prefixes): 24 for reader, prefix in zip(readers, prefixes): ---> 25 for row in reader: 26 #print row 27 assert row['arrival_time'] != "", "Some stop_times entries is missing arrival time information."

~/Projects/GemeenteAmsterdam/gtfspy-master/gtfspy/import_loaders/table_loader.py in (.0) 217 csv_reader_stripped = (dict((k, (v.strip() if v is not None else None)) # v is not always a string 218 for k, v in row.items()) --> 219 for row in csv_reader) 220 csv_reader_generators.append(csv_reader_stripped) 221 except TypeError as e:

~/Projects/GemeenteAmsterdam/gtfspy-master/gtfspy/import_loaders/table_loader.py in (.0) 216 # The following results in a generator, the complicated 217 csv_reader_stripped = (dict((k, (v.strip() if v is not None else None)) # v is not always a string --> 218 for k, v in row.items()) 219 for row in csv_reader) 220 csv_reader_generators.append(csv_reader_stripped)

AttributeError: 'list' object has no attribute 'strip'

As you can see the error is an object mismatch. As I said before, unpacking the zip works, but using the zip directly does not.

BeishuizenTimKPMG avatar Mar 05 '19 08:03 BeishuizenTimKPMG

I have the same problem when reading files for Prague transport:

/home/miska/PycharmProjects/prague_public_transport_app/venv/bin/python /home/miska/PycharmProjects/prague_public_transport_app/search_stops/import_gtfs_data.py
Beginning AgencyLoader
Importing agency.txt into agencies for 
Indexing agencies
Post-import agency.txt into agencies
Beginning RouteLoader
Importing routes.txt into routes for 
Indexing routes
Beginning MetadataLoader
Indexing metadata
Beginning CalendarLoader
Importing calendar.txt into calendar for 
Indexing calendar
Beginning CalendarDatesLoader
Importing calendar_dates.txt into calendar_dates for 
Beginning ShapeLoader
Importing shapes.txt into shapes for 
Indexing shapes
Post-import shapes.txt into shapes
Beginning FeedInfoLoader
Importing feed_info.txt into feed_info for 
Beginning StopLoader
Importing stops.txt into stops for 
Traceback (most recent call last):
  File "/home/miska/PycharmProjects/prague_public_transport_app/search_stops/import_gtfs_data.py", line 52, in <module>
    load_or_import_example_gtfs(verbose=True)
  File "/home/miska/PycharmProjects/prague_public_transport_app/search_stops/import_gtfs_data.py", line 20, in load_or_import_example_gtfs
    location_name="Prague")
  File "/home/miska/PycharmProjects/prague_public_transport_app/venv/lib/python3.6/site-packages/gtfspy/import_gtfs.py", line 104, in import_gtfs
    loader.import_(conn)
  File "/home/miska/PycharmProjects/prague_public_transport_app/venv/lib/python3.6/site-packages/gtfspy/import_loaders/table_loader.py", line 357, in import_
    self.insert_data(conn)
  File "/home/miska/PycharmProjects/prague_public_transport_app/venv/lib/python3.6/site-packages/gtfspy/import_loaders/table_loader.py", line 297, in insert_data
    cur.executemany(stmt, rows)
  File "/home/miska/PycharmProjects/prague_public_transport_app/venv/lib/python3.6/site-packages/gtfspy/import_loaders/stop_loader.py", line 14, in gen_rows
    for row in reader:
  File "/home/miska/PycharmProjects/prague_public_transport_app/venv/lib/python3.6/site-packages/gtfspy/import_loaders/table_loader.py", line 219, in <genexpr>
    for row in csv_reader)
  File "/home/miska/PycharmProjects/prague_public_transport_app/venv/lib/python3.6/site-packages/gtfspy/import_loaders/table_loader.py", line 218, in <genexpr>
    for k, v in row.items())
AttributeError: 'list' object has no attribute 'strip'

Is there some workaround to this issue?

evelyn9191 avatar Jan 19 '20 11:01 evelyn9191

Looks like one of the rows's csv is being turned into a list, instead of string. I guess it's done something clever.

Can you look into the stops.txt and see if anything is weird there? and/or, use a debugger to try to figure out the bad line and value? That would help to understand what is going on...

Otherwise, can you gave an link to the exact file you are using and exact command line used?

rkdarst avatar Mar 02 '20 23:03 rkdarst

I cannot replicate this issue as it was solved by https://github.com/CxAalto/gtfspy/pull/24. At that time, I considered the issue to be connected with special characters that caused that a string was considered split to many due to the special chars (i.e. creating something like ["/ax instead of ลก).

I am using GTFS zip file that can be downloaded here and ran the script by import_gtfs.import_gtfs(["..\\data\\traffic_source.zip"], "some.db", print_progress=verbose, location_name="Prague")

evelyn9191 avatar Mar 03 '20 07:03 evelyn9191