False positives of missing_required_field when encountering empty transfer_type values in transfers.txt
Bug report
The GTFS Validator is generating false positives of the missing_required_field when it encounters empty transfers.txt#transfer_type field values.
Describe the bug
Given an empty value within the transfers.txt#transfer_type field (which is valid according to the spec), when the validator checks the file, then it will falsely raise a missing_required_field notice.
How we reproduce the bug
Example feed: Archive.zip
In this feed in the transfers.txt file, the following use case is shown:
from_stop_id,to_stop_id,transfer_type,min_transfer_time
1,2,,
This is a valid use case in that the transfer_type value is empty which indicates a "Recommended transfer point between routes".
Expected behaviour
The validator should not flag this as an error.
Observed behaviour
The GTFS Validator flagged the above-mentioned use case as an error.
Screenshots:
See relevant JSON output for the above-linked feed.
{
"notices": [{
"code": "missing_required_field",
"severity": "ERROR",
"totalNotices": 1,
"sampleNotices": [{
"filename": "transfers.txt",
"csvRowNumber": 2.0,
"fieldName": "transfer_type"
}]
},
...
Environment versions
- validator version: v2, v3.0.1, v3.1.0
- Java version:
- OS versions:
Thank you for your reporting a bug. The issue has been placed in triage, the MobilityData team will follow-up on it.
Hello Evan,
Thanks for flagging this! This is super useful, having a reliable validation report is a top priority. 🙂 You are right, this field is considered required in the validator, so the notice is triggered for every row that has an empty value, regardless of whether the column is present or not. 👇 Source code from GtfsTransferSchema.java.
@Required
GtfsTransferType transferType();
This is another example of a slightly blurry definition of what "empty" means in the spec. We have a similar issue with timepoint, and we ended up differentiating between (a) an empty value but an existing column, and (b) an empty column. I think we need to follow a similar logic here.
- empty value but existing column: valid data
| from_stop_id | to_stop_id | transfer_type | min_transfer_time |
|---|---|---|---|
| 1 | 2 |
- empty column: invalid data
| from_stop_id | to_stop_id | min_transfer_time |
|---|---|---|
| 1 | 2 |
cc @Cristhian-HA @bdferris @barbeau
@isabelle-dr @bdferris-v2 It seems that @Required annotation is used to validate both that the header exists (DefaultTableHeaderValidator.java) and that the field is non-empty (RowParser.java).
If I understand correctly, the goal here is to generate an error notice if the column header is missing, but no notice if the header is present and the field is empty? Or is it conditional - the field should not be empty if min_transfer_time is non-empty?
@briandonahue that's correct, the goal is to generate a notice if the file transfers.txt is present but the header transfer_type is not in the header of the file. No link with min_transfer_time here.
It looks like @Required in the schema file was not a good approach because it generates missing_required_field for every record that has an empty value in transfer_type.
This "0 or empty" semantic is often used for optional fields in the spec, I believe transfer_type is the only required field that has it. It's not great in terms of interpretation by a machine.
Could we use the MissingRequiredColumn notice for this?
Could we use the
MissingRequiredColumnnotice for this?
@isabelle-dr I believe so. I'll just add a validator that manually checks that the header exists, and throws that notice, and remove the @Required annotation.
@isabelle-dr on further review - I see that there is a GtfsTableContainer.TableStatus.INVALID_HEADERS status for Tables that is supposed to represent if columns are missing, which is tested here. I think a better approach may be to add a new annotation, @RequiredColumn or @RequiredHeader? And potentially rename @Required to @RequiredField or @RequiredValue?
Then we would update the logic in AnyTableLoader.java to check both annotations and add the MissingRequiredColumn notice in either case.
Interesting. I think we also use @Required for files.
Brian is on holiday this week.
@aababilov, any wisdom to share?
I think we also use
@Requiredfor files.
You're correct, I missed that detail. Well, we could still do this and just add a @RequiredHeader annotation and leave the @Required functionality as-is. I'd argue it might be clearer to have separate annotations for Fields, Headers, and Files but I could raise that as a separate issue if it seems useful...