UK2GTFS
UK2GTFS copied to clipboard
Invalid gtfs for latest ATOC - NA values and unknown stop_ids.
Hello, revisiting this after a while. I'm getting an invalid gtfs for the latest ATOC data.
Data was downloaded 23 Dec, filename isgtfs_ttis222.zip.
Are these easy to remedy?
> gtfs_validate_internal(gtfs)
Warning messages:
1: In gtfs_validate_internal(gtfs) : NA values in stops
2: In gtfs_validate_internal(gtfs) : Unknown stop_id in stop_times
In particular, we have the following NA values:
> gtfs$stops[rowSums(is.na(gtfs$stops)) > 0,]
stop_id stop_code stop_name stop_lat stop_lon
3183 BSPSBUS <NA> Bishops Lydeard Lydeard Arms 51.05540 -3.18876
3185 ABHLJN <NA> Abbeyhill Junction 55.95542 -3.17036
3207 DUNROD <NA> Dunrod 55.91825 -4.84359
3568 NWTLWJ <NA> NEWTON WEST JN 55.81802 -4.14688
3617 SEVT730 <NA> SEVERN TUNNEL SIG NT1730 51.58257 -2.80170
5030 CREWPLP <NA> CREWE UP & DN POTTERY LP 53.07933 -2.41846
5039 CREWUML <NA> CREWE UP MANCHESTER LOOP 53.09331 -2.43398
5182 CWLRSSJ <NA> COWLAIRS SOUTH JN 55.88094 -4.23906
5849 HETNLJN <NA> HEATON LODGE JN 53.67960 -1.71774
6131 DONCLCJ <NA> LOVERSALL CARR JN 53.48415 -1.07357
6388 HOLME <NA> HOLME JN. 52.47127 -0.23738
6504 HORBRYJ <NA> HORBURY JN 53.65918 -1.53116
7111 HAMBLEJ <NA> HAMBLETON EAST JN 53.77646 -1.14664
7112 HAMBLNJ <NA> HAMBLETON NORTH JN 53.78117 -1.15901
7230 EUSKJN <NA> EAST USK JN. 51.58452 -2.96298
7262 HAUGHDJ <NA> HAUGHHEAD JN 55.76989 -4.01281
7382 RTHGNEJ <NA> RUTHERGLEN EAST JN 55.82844 -4.19624
7457 STAN201 <NA> STANSTED AIRPORT SIG L1201 51.88514 0.25569
7458 STANCLJ <NA> STANSTED COOPERS LANE JN 51.88661 0.25793
7940 STSNJN <NA> STENSON JN 52.86534 -1.53425
8174 SWANSLW <NA> SWANSEA LOOP WEST 51.63791 -3.94316
8266 MRRYTNL <NA> ALLANTON LOOP 55.76693 -3.99792
8613 SHEETSJ <NA> SHEET STORES JN 52.88237 -1.27414
8897 SKELTON <NA> SKELTON JN. (YORK) 53.97117 -1.12073
9121 TRENT <NA> TRENT EAST JN 52.88501 -1.26475
9298 SOHAM <NA> SOHAM 52.33420 0.32798
9304 SOKEJN <NA> STOKE JN. 52.83906 -0.58012
9324 NWSTLP <NA> NEWSTEAD LOOP 53.07001 -1.22182
9939 WATSTJN <NA> Water Street Junction 53.47732 -2.25949
9954 NLRT478 <NA> Northallerton Signal Y478 54.34730 -1.43854
9955 OXEN45 <NA> Oxenholme Signal CE45 54.28966 -2.73614
And the following missing stop_ids:
> gtfs$stop_times[gtfs$stop_times$stop_id %!in% gtfs$stops$stop_id,]
trip_id arrival_time departure_time stop_id stop_sequence pickup_type drop_off_type
1155238 61637 16:45:00 16:49:00 SOHA491 4 0 0
1341949 58910 12:45:00 12:49:00 SOHA491 4 0 0
2476719 51599 31:45:00 32:37:00 WMBYEFR 17 0 0
Thank you!
Upon running the outputted gtfs.zip through OTP, I get
15:26:38.766 ERROR (OTPMain.java:46) An uncaught error occurred inside OTP: io error: entityType=org.onebusaway.gtfs.model.StopTime path=stop_times.txt lineNumber=1155239
org.onebusaway.csv_entities.exceptions.CsvEntityIOException: io error: entityType=org.onebusaway.gtfs.model.StopTime path=stop_times.txt lineNumber=1155239
at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:161) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:120) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:115) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:108) ~[otp-2.0.0-shaded.jar:1.1]
at org.opentripplanner.graph_builder.module.GtfsModule.loadBundle(GtfsModule.java:239) ~[otp-2.0.0-shaded.jar:1.1]
at org.opentripplanner.graph_builder.module.GtfsModule.buildGraph(GtfsModule.java:130) ~[otp-2.0.0-shaded.jar:1.1]
at org.opentripplanner.graph_builder.GraphBuilder.run(GraphBuilder.java:80) ~[otp-2.0.0-shaded.jar:1.1]
at org.opentripplanner.standalone.OTPMain.startOTPServer(OTPMain.java:123) ~[otp-2.0.0-shaded.jar:1.1]
at org.opentripplanner.standalone.OTPMain.main(OTPMain.java:39) ~[otp-2.0.0-shaded.jar:1.1]
Caused by: org.onebusaway.gtfs.serialization.EntityReferenceNotFoundException: entity reference not found: type=org.onebusaway.gtfs.model.Stop id=SOHA491
at org.onebusaway.gtfs.serialization.GtfsReader.getAgencyForEntity(GtfsReader.java:211) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.gtfs.serialization.GtfsReader$GtfsReaderContextImpl.getAgencyForEntity(GtfsReader.java:302) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.gtfs.serialization.mappings.EntityFieldMappingImpl$ConverterImpl.convert(EntityFieldMappingImpl.java:104) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.gtfs.serialization.mappings.EntityFieldMappingImpl.translateFromCSVToObject(EntityFieldMappingImpl.java:61) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.csv_entities.IndividualCsvEntityReader.readEntity(IndividualCsvEntityReader.java:131) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.csv_entities.IndividualCsvEntityReader.handleLine(IndividualCsvEntityReader.java:98) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:157) ~[otp-2.0.0-shaded.jar:1.1]
... 8 common frames omitted
Line 1155239 of stop_times.txt reads:
61637,16:45:00,16:49:00,SOHA491,4,0,0
Running gtfs_force_valid on this gtfs object does fix this error, but OTP then complains as follows:
12:55:17.555 ERROR (OTPMain.java:46) An uncaught error occurred inside OTP: io error: entityType=org.onebusaway.gtfs.model.Transfer path=transfers.txt lineNumber=2
org.onebusaway.csv_entities.exceptions.CsvEntityIOException: io error: entityType=org.onebusaway.gtfs.model.Transfer path=transfers.txt lineNumber=2
at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:161) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:120) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:115) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:108) ~[otp-2.0.0-shaded.jar:1.1]
at org.opentripplanner.graph_builder.module.GtfsModule.loadBundle(GtfsModule.java:239) ~[otp-2.0.0-shaded.jar:1.1]
at org.opentripplanner.graph_builder.module.GtfsModule.buildGraph(GtfsModule.java:130) ~[otp-2.0.0-shaded.jar:1.1]
at org.opentripplanner.graph_builder.GraphBuilder.run(GraphBuilder.java:80) ~[otp-2.0.0-shaded.jar:1.1]
at org.opentripplanner.standalone.OTPMain.startOTPServer(OTPMain.java:123) ~[otp-2.0.0-shaded.jar:1.1]
at org.opentripplanner.standalone.OTPMain.main(OTPMain.java:39) ~[otp-2.0.0-shaded.jar:1.1]
Caused by: org.onebusaway.gtfs.serialization.EntityReferenceNotFoundException: entity reference not found: type=org.onebusaway.gtfs.model.Stop id=ASHFKI
at org.onebusaway.gtfs.serialization.GtfsReader.getAgencyForEntity(GtfsReader.java:211) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.gtfs.serialization.GtfsReader$GtfsReaderContextImpl.getAgencyForEntity(GtfsReader.java:302) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.gtfs.serialization.mappings.EntityFieldMappingImpl$ConverterImpl.convert(EntityFieldMappingImpl.java:104) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.gtfs.serialization.mappings.EntityFieldMappingImpl.translateFromCSVToObject(EntityFieldMappingImpl.java:61) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.csv_entities.IndividualCsvEntityReader.readEntity(IndividualCsvEntityReader.java:131) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.csv_entities.IndividualCsvEntityReader.handleLine(IndividualCsvEntityReader.java:98) ~[otp-2.0.0-shaded.jar:1.1]
at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:157) ~[otp-2.0.0-shaded.jar:1.1]
... 8 common frames omitted
I then ran gtfs_clean on the validated gtfs object, this resulted in the same error.
Noticing that the stop_id=ASHFKI is indeed present in data/tiplocs.rda, I forced an update with remotes::install_github("ITSleeds/UK2GTFS@ed0fd418d90b837fb689bd154cf40a0b95912be5") (the latest commit available)
and the ASHFKI issue persists after running gtfs_force_valid again.
I note that this latter issue seems similar to https://github.com/ITSLeeds/UK2GTFS/issues/29.
I have noticed the unknown stop_ids as well. I suppose SOHA491 should be SOHAM and WMBYEFR should be WMBY. I hope there is a better solution than changing those afterwards in stop_times.txt.
Well, I finally got this working but it took a few extra steps.
To summarize what is surely not the most efficient way:
- clone the repo and in
R/atoc.R, comment out this line to save all stops. build, install and import this local version by doing this. If you want to change the name of this library, change thePackage:field in your localDESCRIPTIONfile. - use
atoc2gtfswith standard args. - use
gtfs_force_validto get rid of theSOHA491error. - sort out duplicates in
gtfs$stops$stop_id. These ones I remove have stop_ids and stop_codes (PYECRNR,PYE), (SESABUS,ZBU) and (ESJLEDS,XES). (I just removed these afterwards instops.txt- am new to R) - some transfers involve stop_ids that don't exist. Remove them with
gtfs$transfers <- gtfs$transfers[gtfs$transfers$to_stop_id %in% gtfs$stops$stop_id,]andgtfs$transfers <- gtfs$transfers[gtfs$transfers$from_stop_id %in% gtfs$stops$stop_id,] - export with
gtfs_write.
This can happen when ATOC add new TIPLOCs that are not in the package database.
The CIF files contain the locations of TIPLOCs but they are often woefully inaccurate so by default UK2GTFS uses an internal database you can see by.
library(UK2GTFS)
head(as.data.frame(tiplocs))
If you want to draw the TIPLOC locations from the CIF file you can use locations = "file" in atoc2gtfs or you can provide your own sf data frame of points.
I periodically update the database so I'll have a look and see if new locations are required.
Also missing stop_code is quite common with the ATOC data and is not a problem as they are optional in the GTFS spec
I've pushed an update that will now check and pull in any missing tiplocs with a warning.
I've also added 10 new tiplocs to the database
stop_id stop_code stop_name stop_lat stop_lon
113 ACTONTN ZAT ACTON TOWN 51.50273 -0.28114
1236 BRENTX BCZ BRENT CROSS WEST 51.56847 -0.22671
1564 BSTMNR ZBM BOSTON MANOR 51.49529 -0.32608
3485 ELINTN ELT EAST LINTON 55.98421 -2.66029
7020 MSBTN MBT MARSH BARTON 50.70419 -3.52228
8169 PTWYPR PRI PORTWAY PARK AND RIDE 51.48902 -2.68984
8427 RESTSTN RSN RESTON 55.85015 -2.19483
10312 TOTNSSR XSC SALCOMBE SHADYCOMBE ROAD 50.71209 -3.78883
11585 ABARASQ AER ABERAERON 52.24265 -4.25842
11685 CATZ016 LPD LUTON AIRPORT PARKWAY DART 51.87302 -0.39489
Also missing
stop_codeis quite common with the ATOC data and is not a problem as they are optional in the GTFS spec
Noted- not good of OTP to complain, then!
I haven't tested again, but I presume your last commit has fixed the ASHFKI issue? That was a stop_id that was present in the included tiplocs data frame but was prematurely filtered out, with this step:
# remove any unused stops
stops <- stops[stops$stop_id %in% stop_times$stop_id, ]
Thank you for maintaining the library!
Let me know if you have any more problems. Unfortunately missing or bad data is quite common and the only fix is for people to report it.
The package now tracks over 10,000 tiplocs but there are only about 2,500 stations in the UK. Which gives you an idea of how many temporary or intermittent ones are used for thing like bus replacement services etc.