gtfs-validator icon indicating copy to clipboard operation
gtfs-validator copied to clipboard

Implement duplicated date in service period verification

Open maximearmstrong opened this issue 4 years ago • 4 comments

Is your feature request related to a problem? Please describe. A date should not be duplicated in a service period. This is a GTFS rule implemented in Google Python validator as SetDateHasService.

Describe the solution you'd like Actual Google GTFS validator behaviour : verifies a date is not duplicated for the same service id. https://github.com/google/transitfeed/blob/d727e97cb66ac2ca2d699a382ea1d449ee26c2a1/transitfeed/serviceperiod.py#L126

Describe alternatives you've considered

Additional context Line 174 in Error support priorities https://docs.google.com/spreadsheets/d/1vqe6wq7ctqk1EhYkgtZ0_TbcQ91vccfs2daSjn20BLE/edit#gid=0

maximearmstrong avatar May 11 '20 15:05 maximearmstrong

@maximearmstrong ~~I think this rule is for calendar_dates.txt, but only in the case where calendar.txt is empty?~~ EDIT This isn't correct - see my later comment below.

barbeau avatar May 11 '20 16:05 barbeau

@barbeau The Python validator doesn't seem to verify if calendar.txt when verifying this. I thought the goal here was to verify if a date was added 2 times to calendar_dates.txt. Would it not be necessary if calendar.txt was not empty?

maximearmstrong avatar May 11 '20 21:05 maximearmstrong

@maximearmstrong You're allowed to have two records in calendar_dates.txt with the same date, as long as the ID and exception type is different.

calendar_dates.txt is used in two ways in GTFS:

  1. It defines exceptions to service defined in calendar.txt. So you may have weekdays that normally run service_id 1, but if a holiday Dec 25th falls on a Monday, you'd have 2 records here - one saying remove service_id 1 on Dec 25th, and a second one saying add service_id 2 on Dec 25th (for whatever the Dec 25th service is defined by service_id 2).
  2. If agencies don't use calendar.txt to define service, they can provide a massive dump of dates in calendar_dates.txt that says which service_id is running on which date. In this case, service_id is a primary key for calendar_dates.txt and all service will be exception_type 1 for added.

We just need to make sure we're considering both use cases of calendar_dates.txt when we implement a rule for these tables.

barbeau avatar May 11 '20 21:05 barbeau

We usually don't add rules that are not clearly specified in the spec or best practices (this is the case here), but we make exceptions for what the community thinks is valuable to check (point_near_origin, fast_travel_between_consecutive_stops, unused_trip, just to name a few). This is done on a case-by-case basis, by discussing with the community. We are in favor of having the spec modified first before adding the check in the validator.

isabelle-dr avatar Oct 03 '22 20:10 isabelle-dr

Coming back to this old issue, my interpretation is that the logic in the Python validator specifically enforced that there could not be multiple entries in calendar_dates.txt with the same combination of (service_id + date). Support for this same check was added to the gtfs-validator in PR #1190 when I added support for multi-column primary keys and the calendar_dates.txt schema was specifically updated to specify (service_id + date) as the primary key. As such, I believe we can close this issue as "implemented".

bdferris-v2 avatar Nov 17 '22 21:11 bdferris-v2