GTFS-Issues icon indicating copy to clipboard operation
GTFS-Issues copied to clipboard

SEPTA: ZIP file contains multiple ZIP archives within

Open mlundblad opened this issue 1 year ago • 4 comments

Issue description GTFS feeds (obtained via Transitland) for SEPTA (Southeast Pennsylvania Transportation Agency) contains two GTFS files within the ZIP file.

There are two links, one for a bus and one for a rail feed.

Last update of GTFS Feed 2024-09-07

Hash of the GTFS Feed SHA1: adb983d5fae46af17e07ae8ae31423b2a91b6916 SHA1: da7a6dc4e8f83f9b6dd4b1dc1e984b56a25c96b5

GTFS Feed Download Link https://github.com/septadev/GTFS/releases/latest/download/gtfs_public.zip#google_rail.zip https://github.com/septadev/GTFS/releases/latest/download/gtfs_public.zip#google_bus.zip

Corresponding Transitland pages: https://www.transit.land/feeds/f-dr4-septa~rail https://www.transit.land/feeds/f-dr4-septa~bus

mlundblad avatar Sep 20 '24 07:09 mlundblad

Actually the "anchor part" (after the #) corresponds to the file name of the archive inside the "outer" ZIP. So maybe the intension is supposed to be that the parser treats that as an "address" into the ZIP…

mlundblad avatar Sep 20 '24 07:09 mlundblad

Hi @mlundblad!

Thanks for reporting this issue here. I was not aware that @septadev had already a GTFS GitHub repository they use to publish their feeds and to track issues people have with their feeds. That's great and significantly better than all the agencies I know.

I suggest to open an issue directly there as they surely will track their repo.

hbruch avatar Sep 21 '24 06:09 hbruch

It seems this might be intended from SEPTA: https://github.com/septadev/GTFS/issues/14

In the meantime, I tested implementing support for treating "trailing path" after # in the URL as a "sub ZIP file" and extract the downloaded ZIP and extract and write down that "addressed" inner ZIP in:

https://github.com/public-transport/transitous/pull/518

mlundblad avatar Sep 21 '24 07:09 mlundblad

Aha, and actually there seems to be directly links (not via the GitHub page).

So, maybe we should just use an HTTP source instead.

mlundblad avatar Sep 21 '24 07:09 mlundblad