transitland-processing-animation icon indicating copy to clipboard operation
transitland-processing-animation copied to clipboard

Problems with the stops data

Open jaimeorrego opened this issue 6 years ago • 7 comments

Hello Will,

Thank you for the API it is very nice. I was testing it with different cities in the World, it worked very well in Portland, Or, but I found two issues in toher cities. I tested in Lisbon, Portugal using:

python transitflow.py --name=lisbon --bbox=-9.276933,38.592729,-8.940477,38.803201 --clip_to_bbox When downloading the transit operators, the API found routes and stops but finds 0 schedule stop pairs. An example of one of the largest operators.

o-eyck-carris 7 / 8
http://transit.land/api/v1/routes?per_page=10000&operated_by=o-eyck-carris
217 routes found.

http://transit.land/api/v1/stops?per_page=10000&served_by=o-eyck-carris
2093 stops found.
 
http://transit.land/api/v1/schedule_stop_pairs?date=2018-04-26&per_page=10000&sort_min_id=0&operator_onestop_id=o-eyck-carris
0 schedule stop pairs found.

Another test I did was in Santiago, Chile. Here the API has problems downloading the stops data.

python transitflow.py --name=Santiago --bbox=-70.673777,-33.460993,-70.595499,-33.394518 --clip_to_bbox

And it seems it cannot connect:

o-66jc-transantiago 2 / 2
http://transit.land/api/v1/routes?per_page=10000&operated_by=o-66jc-transantiago
383 routes found.

http://transit.land/api/v1/stops?per_page=10000&served_by=o-66jc-transantiago
retry 1 / 5: HTTP Error 504: Gateway Time-out
retry 2 / 5: HTTP Error 504: Gateway Time-out
retry 3 / 5: HTTP Error 504: Gateway Time-out
retry 4 / 5: HTTP Error 504: Gateway Time-out
retry 5 / 5: HTTP Error 504: Gateway Time-out
failed:
HTTP Error 504: Gateway Time-out
1 operators successfully downloaded.
1 operators failed.

I thinking in Lisbon case, it may be a problem with the structure of the GTFS data, and in Santiago maybe the file is too large?

Do you have any clues?

Thanks!

jaimeorrego avatar Apr 26 '18 20:04 jaimeorrego

Thanks for noting these issues, @jaimeorrego.

I can confirm the same errors for Lisbon and Santiago. I believe this is happening because large bus systems have a lot of stop_times to download, and the API is stalling with so many big requests.

I tried decreasing the API request size from 10,000 items per page to 1,000 items per page, and this seemed to help things! There are 10x more API requests, but each is 10x smaller. I also increased the API retry limit from 5 to 20, just in case.

Santiago looks better:

screen shot 2018-04-27 at 8 21 55 pm

Strangely, for Lisbon, it fails for me on today's date, but if I try this past Wednesday's date, the stop times for o-eyck-carris do successfully download:

transitflow will$ python transitflow.py --name=lisbon --bbox=-9.276933,38.592729,-8.940477,38.803201 --clip_to_bbox --date=2018-04-23

screen shot 2018-04-27 at 8 36 00 pm

I think I will add a new command line argument --per_page to allow for the user to determine the number of items per page of each API request, as well as --retrylimit.

Does this sound good to you?

Best, Will

willgeary avatar Apr 28 '18 00:04 willgeary

Thanks this is very helpful, I have been having both issues above working on Toronto, Canada area. The TTC operator seems to be too large and fails for all dates I have tried, even with the API query set to 1000 - could you test this on your end? the error I get is "[Errno 34] Result too large" Thanks, I love this tool!

python transitflow.py --name=TTC --operator=o-dpz8-ttc

python transitflow.py --name=Toronto --bbox=-79.472351,43.597798,-79.280777,43.709083 --clip_to_bbox

AnthonyLovesBikes avatar Apr 29 '18 02:04 AnthonyLovesBikes

Thanks @AnthonyLovesBikes, I can confirm the same error for Toronto area. Yes, the TTC operator seems to be too large. Although, I have seen at least one example of somebody using this tool to visualize Toronto transit flows (they even wrote a program to convert transit frequency into audio!): See: https://rami-codes.github.io/2017/11/07/transitland-audiolizer/

Frankly, I am not sure if downloading massive schedules via the paginated transitland API is the best approach. It is much faster to download the raw GTFS zip file and process it locally with a python script. I would love to add a "drag and drop" capability to this tool, such that a user could decide to use the transitland API or to use a local GTFS zip file. Any thoughts on this functionality are welcome!

Best, Will

willgeary avatar Apr 29 '18 15:04 willgeary

Thank you! Yes I agree a manual GTFS adder would be ideal. If can include multiple agencies that would be best. I have messaged the other user to inquire how they made the TTC viz work... Will let you know if I learn more. I am now having an issue with GO transit, though that one worked before for me. Can you let me know if that one works for you?

Sent from my iPhone

On Apr 29, 2018, at 11:15 AM, Will Geary [email protected] wrote:

Thanks @AnthonyLovesBikes, I can confirm the same error for Toronto area. Yes, the TTC operator seems to be too large. Although, I have seen at least one example of somebody using this tool to visualize Toronto transit flows (they even wrote a program to convert transit frequency into audio!): See: https://rami-codes.github.io/2017/11/07/transitland-audiolizer/

Frankly, I am not sure if downloading massive schedules via the paginated transitland API is the best approach. It is much faster to download the raw GTFS zip file and process it locally with a python script. I would love to add a "drag and drop" capability to this tool, such that a user could decide to use the transitland API or to use a local GTFS zip file. Any thoughts on this functionality are welcome!

Best, Will

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

AnthonyLovesBikes avatar Apr 29 '18 15:04 AnthonyLovesBikes

Thank you @willgeary! by changing the request site it works fine. I am just entering the world of GTFS data and definitely the drag and drop option would be interesting. Maybe is not exactly the place in this API, but also would be nice to have a GTFS data processor, that let you after some variable setting obtain a output.csv (for example, the number of the route). The idea of course would use the data in other kind of application. Thanks!

jaimeorrego avatar Apr 30 '18 01:04 jaimeorrego

Great, glad to hear that things are working for you @jaimeorrego.

I agree that a GTFS data processor would be nice. Frankly, I am considering whether that should belong within this project or as a standalone project.

willgeary avatar May 01 '18 14:05 willgeary

Hi, I have the same problem as @jaimeorrego with data for Lisbon. But strangely, I can only download successfully the data for weekends or national holidays, maybe when the frequency of the buses (carris) is lower. I tried 1st May, 25th April, 1st April, and it was successful. I tried 23rd April, a regular day (as it seems @willgeary did, but the print screen then shows 25th April), and it doesn't fetch the data, neither other regular days in April. I changed the request size and limit as you suggested.

I agree that a option to run data locally would be better.

Thanks for the api!

temospena avatar May 09 '18 23:05 temospena