pfaedle icon indicating copy to clipboard operation
pfaedle copied to clipboard

Doesn't like Non-negative Integers greater than 65535

Open stupidpupil opened this issue 2 years ago • 2 comments

If an input GTFS has a field with a Non-negative Integer greater than 65535, then pfaedle will stop with a parsing error like:

data-raw/filed40c09ec85/frequencies.txt:11566: in field 'headway_secs', expected integer in range [0,65535]

(In this case, the headway_secs was 73800, or 20hrs30min.)

This is valid according to the GTFS reference which places no upper limit on Non-negative Integer fields:

Non-negative Integer - A integer greater than or equal to 0.

But, in practice, it seems unreasonable to expect tools to deal with arbitrarily large numbers, so I'm not sure I can really fault pfaedle here! (I will raise an issue with the tool that generated this GTFS as well.)

For completeness, there are two places where this field-type is used where I think you might, sort of, expect to find very large values:

  • frequencies.headway_secs, and maybe
  • transfers.min_transfer_time (also in seconds)

stupidpupil avatar May 11 '22 08:05 stupidpupil

Ah, it looks as though the tool that I used to generate this GTFS has just added a flag to limit headway_secs, so I'll try using that: https://github.com/patrickbr/gtfstidy/commit/d22ca560952b1883725123dbe3e0c4e6256cd4f1

(And might suggest reducing the default to 65535 seconds, from 24 hours.)

stupidpupil avatar May 11 '22 09:05 stupidpupil

If an input GTFS has a field with a Non-negative Integer greater than 65535, then pfaedle will stop with a parsing error […] (In this case, the headway_secs was 73800, or 20hrs30min.) […] But, in practice, it seems unreasonable to expect tools to deal with arbitrarily large numbers, so I'm not sure I can really fault pfaedle here! (I will raise an issue with the tool that generated this GTFS as well.)

Fair enough, but I'd argue that pfaedle's choice of numeric types shouldn't limit what I'm able to express in GTFS, as long as I follow the GTFS spec.

derhuerst avatar May 11 '22 10:05 derhuerst

I mostly agree, but of course it's impossible to exactly implement the GTFS spec (i.e. unlimited integer ranges) on a machine with finite memory. In the underlying library (cppgtfs) we usually chose variable types in such a way that they allow for all realistic values. We decided to use a type for the headway seconds that allows for values corresponding to at least 12 hours (unsigned 16 bit integer). Remember that the headway seconds are given relative to a specific date, so if you have a trip that is repeated every N > 12 hours, it is probably not a good idea to model this using a frequency beginning on a specific date. There are two situations I think in which this would occur:

  1. There is a single trip per day, and its schedule is modelled with a frequency that only has a single occurance per day. In this case, it would be better to bake the time into stop_times.txt explicitly.

  2. You actually have a trip that is repeated every 20 hours, over a number of M days. If you are using frequencies.txt to model this, you are assuming a single day spanning M * 24 hours, which isn't very natural. It would be better to give the explicit stop_times.txt for each trip.

Nevertheless, I agree that the standard allows for arbitrary values, and there might by a useful scenario missing from my list. We increased the max number to 2^32 in https://github.com/ad-freiburg/cppgtfs/commit/aa2714065e1ee8a4773f25aec598d781985086b4, which is now used in https://github.com/ad-freiburg/pfaedle/commit/b4b08baecabe5b48a631d6167cec9c457c497c81

patrickbr avatar Oct 04 '22 12:10 patrickbr