gtfs-validator
gtfs-validator copied to clipboard
Implement duplicated date in service period verification
Is your feature request related to a problem? Please describe. A date should not be duplicated in a service period. This is a GTFS rule implemented in Google Python validator as SetDateHasService.
Describe the solution you'd like Actual Google GTFS validator behaviour : verifies a date is not duplicated for the same service id. https://github.com/google/transitfeed/blob/d727e97cb66ac2ca2d699a382ea1d449ee26c2a1/transitfeed/serviceperiod.py#L126
Describe alternatives you've considered
Additional context Line 174 in Error support priorities https://docs.google.com/spreadsheets/d/1vqe6wq7ctqk1EhYkgtZ0_TbcQ91vccfs2daSjn20BLE/edit#gid=0
@maximearmstrong ~~I think this rule is for calendar_dates.txt, but only in the case where calendar.txt is empty?~~ EDIT This isn't correct - see my later comment below.
@barbeau The Python validator doesn't seem to verify if calendar.txt when verifying this. I thought the goal here was to verify if a date was added 2 times to calendar_dates.txt. Would it not be necessary if calendar.txt was not empty?
@maximearmstrong You're allowed to have two records in calendar_dates.txt with the same date, as long as the ID and exception type is different.
calendar_dates.txt is used in two ways in GTFS:
- It defines exceptions to service defined in calendar.txt. So you may have weekdays that normally run service_id 1, but if a holiday Dec 25th falls on a Monday, you'd have 2 records here - one saying remove service_id 1 on Dec 25th, and a second one saying add service_id 2 on Dec 25th (for whatever the Dec 25th service is defined by service_id 2).
- If agencies don't use calendar.txt to define service, they can provide a massive dump of dates in calendar_dates.txt that says which service_id is running on which date. In this case,
service_id
is a primary key for calendar_dates.txt and all service will be exception_type1
for added.
We just need to make sure we're considering both use cases of calendar_dates.txt when we implement a rule for these tables.
We usually don't add rules that are not clearly specified in the spec or best practices (this is the case here), but we make exceptions for what the community thinks is valuable to check (point_near_origin, fast_travel_between_consecutive_stops, unused_trip, just to name a few). This is done on a case-by-case basis, by discussing with the community. We are in favor of having the spec modified first before adding the check in the validator.
Coming back to this old issue, my interpretation is that the logic in the Python validator specifically enforced that there could not be multiple entries in calendar_dates.txt
with the same combination of (service_id + date). Support for this same check was added to the gtfs-validator
in PR #1190 when I added support for multi-column primary keys and the calendar_dates.txt
schema was specifically updated to specify (service_id + date) as the primary key. As such, I believe we can close this issue as "implemented".