GTFS icon indicating copy to clipboard operation
GTFS copied to clipboard

Strings encased in quotes are not properly unescaped/trimmed

Open domints opened this issue 5 months ago • 0 comments

Example feed: https://otwartedane.metropoliagzm.pl/dataset/rozklady-jazdy-i-lokalizacja-przystankow-gtfs-wersja-rozszerzona/resource/290298ce-944b-4744-8f92-29ab2b786a33

Essentially CSV deserializer is not properly treating strings that are encased in quotation marks ("). I saw that was a problem with colors in version 1.7, in this 3.0 beta colors are fine, but now it's a problem with block_id field. Yes, but maybe it doesn't always make sense to have quotation marks within ID, well, it's an ID, but also GTFS docs say:

ID - An ID field value is an internal ID, not intended to be shown to riders, and is a sequence of any UTF-8 characters.
Using only printable ASCII characters is recommended.

So it technically can contain it. Also, Busman, scheduling system widely used in Poland seems to encase any string in quotation marks, which breaks this lib.

I'd suggest treating any string-like field as a string, and if it's enclosed in quotation mark handle it properly. Doesn't this lib have reference to any well known, well tested CSV deserialization library?

domints avatar Sep 06 '24 14:09 domints