gtfs-validator
gtfs-validator copied to clipboard
report.json doesn't include details of parsing failures
Describe the bug In an attempt to validate one of the GTFS included in the {gtfstools} R package (poa.zip), the stderr of the validation includes a SEVERE log message that says "Failed to parse some rows in routes.txt". Later, the stderr says:
SEVERE: -----------------------------------------
| !!! PARSING FAILED !!! |
| Most validators were never invoked. |
| Please see report.json for details. |
-----------------------------------------
The report.json file, however, doesn't include anything related to the parsing failure. This is what it looks like:
Click to see the json output
{
"notices": [
{
"code": "invalid_color",
"severity": "ERROR",
"totalNotices": 4,
"sampleNotices": [
{
"filename": "routes.txt",
"csvRowNumber": 2.0,
"fieldName": "route_text_color",
"fieldValue": "0"
},
{
"filename": "routes.txt",
"csvRowNumber": 3.0,
"fieldName": "route_text_color",
"fieldValue": "0"
},
{
"filename": "routes.txt",
"csvRowNumber": 4.0,
"fieldName": "route_text_color",
"fieldValue": "0"
},
{
"filename": "routes.txt",
"csvRowNumber": 5.0,
"fieldName": "route_text_color",
"fieldValue": "0"
}
]
},
{
"code": "missing_timepoint_column",
"severity": "WARNING",
"totalNotices": 1,
"sampleNotices": [
{
"filename": "stop_times.txt"
}
]
},
{
"code": "stop_time_with_arrival_before_previous_departure_time",
"severity": "ERROR",
"totalNotices": 10,
"sampleNotices": [
{
"csvRowNumber": 5333.0,
"prevCsvRowNumber": 5272.0,
"tripId": "T2-1@1#2310",
"arrivalTime": "00:02:00",
"departureTime": "23:10:00"
},
{
"csvRowNumber": 12153.0,
"prevCsvRowNumber": 12092.0,
"tripId": "T2-1@5#2357",
"arrivalTime": "00:43:00",
"departureTime": "23:57:00"
},
{
"csvRowNumber": 5395.0,
"prevCsvRowNumber": 5334.0,
"tripId": "T2-1@1#2332",
"arrivalTime": "00:24:00",
"departureTime": "23:32:00"
},
{
"csvRowNumber": 14335.0,
"prevCsvRowNumber": 14250.0,
"tripId": "176-1@1#2310",
"arrivalTime": "00:02:00",
"departureTime": "23:10:00"
},
{
"csvRowNumber": 12091.0,
"prevCsvRowNumber": 12030.0,
"tripId": "T2-1@5#2334",
"arrivalTime": "00:20:00",
"departureTime": "23:34:00"
},
{
"csvRowNumber": 12414.0,
"prevCsvRowNumber": 12386.0,
"tripId": "A141-1@3#2340",
"arrivalTime": "00:20:00",
"departureTime": "23:40:00"
},
{
"csvRowNumber": 5457.0,
"prevCsvRowNumber": 5396.0,
"tripId": "T2-1@1#2357",
"arrivalTime": "00:49:00",
"departureTime": "23:57:00"
},
{
"csvRowNumber": 12443.0,
"prevCsvRowNumber": 12415.0,
"tripId": "A141-1@5#2340",
"arrivalTime": "00:20:00",
"departureTime": "23:40:00"
},
{
"csvRowNumber": 9177.0,
"prevCsvRowNumber": 9116.0,
"tripId": "T2-1@2#2357",
"arrivalTime": "00:44:00",
"departureTime": "23:57:00"
},
{
"csvRowNumber": 9115.0,
"prevCsvRowNumber": 9054.0,
"tripId": "T2-1@2#2332",
"arrivalTime": "00:19:00",
"departureTime": "23:32:00"
}
]
},
{
"code": "unknown_column",
"severity": "INFO",
"totalNotices": 1,
"sampleNotices": [
{
"filename": "trips.txt",
"fieldName": "trip_time",
"index": 10.0
}
]
}
]
}
PS: The system_errors.json file is empty.
How we reproduce the bug
Using the latest validator and the feed linked above, just run the validator CLI tool as usual: java -jar validator_path.jar -i poa.gtfs -o output_dir_path.jar
Expected behaviour I'd expect either the report.json to include more details about the parsing failures or the stderr not to mention anything about them.
Environment versions
- validator version: 3.1.0
- Java version: openjdk 11.0.15
- OS versions: Ubuntu 20.04
Thank you for your reporting a bug. The issue has been placed in triage, the MobilityData team will follow-up on it.
This is what the routes.txt looks like:
route_id,agency_id,route_short_name,route_long_name,route_desc,route_type,route_url,route_color,route_text_color
T2,EPTC,T2,TRANSVERSAL 2,,3,,CFD600,0
A141,EPTC,A141,ALIMENTADORA RESTINGA / LOMBA / 5UNIDADE,,3,,FF0000,0
176,EPTC,176,SERRARIA (RODOVIARIA),,3,,FF0000,0
R10,EPTC,R10,RAPIDA RESTINGA NOVA / CAVALHADA,,3,,FF0000,0
Changing route_text_color from 0 to 000000 fixed the issue.
So it's my impression that the validator tries to parse the color as a hexadecimal number and then it fails to do it because it's 0. It's unclear in the report that the invalid_color ERROR is the cause of the parsing failure.
(By the way, does this issue mean that every time the invalid_color error appears a parsing failure happens? If so, wouldn't the error diagnosis be enough to "ignore" this parsing failure? i.e. you could try to parse it as hexadecimal, if it fails then you'd read it as a character, issue the error on the report and then go on to do the rest of the checks)
Edit: RULES.md reads
ERROR notices are for items that the GTFS reference specification explicitly requires or prohibits (e.g., using the language "must"). The validator uses RFC2119 to interpret the language in the GTFS spec.
- Please note that this validator also generates System Errors that give information about things that may have gone wrong during the validation process such as the inability to unzip a GTFS file. These are generated in a second report system_errors.json.
My interpretation is that the route color as 0 is a specification error, and not a system error, like a parsing failure. Still, since parsing failures definitely mean that something has gone wrong during the validation process, the definition above suggest that they should be reported on the system_errors.json, but this file was empty after the validation and the stderr suggested that the failure details would be issued on report.json.
@dhersz Thanks for the report! Have you seen https://github.com/MobilityData/gtfs-validator/issues/1097 and https://github.com/MobilityData/gtfs-validator/issues/1096? I think this might be a duplicate of those.
Thanks for the quick answer @barbeau. Actually, I think it's not exactly a duplicate, although it touches similar topics. I see 3 different things that might be considered problems with the current behaviour:
- The error output says that a parsing failure happened and more details are in report.json. However, report.json doesn't include any parsing failure details. So either the output is wrong and report.json is correct (no parsing failure happened), which doesn't seem the case, or the output is correct but report.json is not (a parsing failure happened, but the details are not listed in report.json, as they should).
- The parsing failure that happened seems to be related with route_text_color. The same issue that causes the parsing failure, however, is already reported as an ERROR in the report. So in my opinion the validator should be able to "recover" from this parsing failure and keep the validation happening (i.e. the parsing failure should not happen, in pseudo-code you could do something like
try(read_color_as_hexadecimal(), fallback = read_color_as_string()), or something like that). - If we decide that the validator should indeed fail when trying to read colors, this failure should be logged in system_errors.json, and not in report.json. More generally, any parsing failures should be logged in systems_errors.json (#1096 shows another situation of a parsing failure that would have been logged in report.json).
It's my understanding that the two issues you linked seem to advocate for more clarity on which validators were invoked or not. While I do agree that this is important, I think the 3 topics I mention above go beyond this subject.
Sorry if I wasn't clear in the beginning. I hope this answer helps clarifying my main concerns with current behaviour.
Hi @dhersz, thanks for opening this and sharing your thoughts above. I agree the logic we currently have here needs to be improved. We are looking into this issue this quarter 🙂
Update on this issue: it has been postponed to the next quarter.
Related to #1097 and #1096
Hi @dhersz ! Thanks for the detailed description on this issue. I opened a PR with a suggested solution.
Since parsing failure specific notices are present in the validation reports (html & json), we removed all confusing logs related to parsing failures as shown bellow.
In the case of poa.zip dataset, the notice related to the parsing failure is
invalid_color.
SEVERE: ----------------------------------------- | Some validators were never invoked. | ----------------------------------------- Skipped validators: FareAttributeAgencyIdValidator,GtfsAttributionAgencyIdForeignKeyValidator,GtfsAttributionRouteIdForeignKeyValidator,GtfsFareAttributeAgencyIdForeignKeyValidator,GtfsFareLegRuleNetworkIdForeignKeyValidator,GtfsFareRuleRouteIdForeignKeyValidator,GtfsRouteAgencyIdForeignKeyValidator,GtfsTransferFromRouteIdForeignKeyValidator,GtfsTransferToRouteIdForeignKeyValidator,GtfsTripRouteIdForeignKeyValidator,MatchingFeedAndAgencyLangValidator,RouteAgencyIdValidator,ShapeToStopMatchingValidator,StopTimeTravelSpeedValidator,TranslationFieldAndReferenceValidator,UrlConsistencyValidator
Please take a look at your convenience and let me know if you have any comments.