GTFS icon indicating copy to clipboard operation
GTFS copied to clipboard

Parsing a feed with quotes...

Open ukadiyala opened this issue 8 years ago • 10 comments

Hi There,

I'm trying to parse a feed which utilises quotes ("") in addition to commas in every file. Would you have an example of how I could configure the reader to discard the quotes?

All files seem to be parsing except the calendar file. Here is a sample of what it looks like:

service_id,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date FULLW,1,1,1,1,1,1,1,20160714,20161014 WE,0,0,0,0,0,1,1,20160714,20161014 "Z1+1","1","1","1","1","1","1","1","20160714","20161014"

The first two lines parse as they are from the test file. The last line which is from the feed does not and I get an error message saying: "Could not parse value "20161014" in field end_date in file calendar.".

I'm pretty sure, there is a configuration item I'm missing. This also makes me wonder about the rest of the data. Can you please help?

Regards, Udhay

ukadiyala avatar Jul 26 '16 21:07 ukadiyala

Can you provide a sample feed or build a unittest that simulates this? Will make it a lot easier to track down this issue...

xivk avatar Jul 27 '16 07:07 xivk

Hi There... Thank you for the prompt response... Attached is a sample .zip extract from a wider set I'm working with... I also noticed the same problem with the calendar_dates file...

Thanks and look forward to hearing from you soon...

Sample.zip

ukadiyala avatar Jul 27 '16 12:07 ukadiyala

Hi there... I'm hoping that you have managed to open the sample files and reproduce the issue I'm facing... any recommendations from your end?

ukadiyala avatar Jul 28 '16 12:07 ukadiyala

Hi There,

Hope all is well on your end. I have not heard back from you. So, I had taken the liberty to replicate your source code on my machine and stepped through it.

The problem seems to in the MoveNext() method of the CSVStreamReader class. Upon looking further into it, the 'line' variable carrying the new line seems to have an additional '' after the ".

I was wondering if I could configure it to use a line pre-processor to solve this problem. Your thoughts?

Regards, Udhay

ukadiyala avatar Aug 01 '16 12:08 ukadiyala

Done... Solved it with a line pre-processor delegate...

used the code below to configure it... reader.LinePreprocessor = delegate (string s) { return s.Replace(""", ""); };

Also, was wondering if you have any samples handy of the invalid feeds you are accounting for in the MoveNext() method... I wonder if looping around might be the best thing to do in a portable class library...

Do let me know...

ukadiyala avatar Aug 01 '16 21:08 ukadiyala

Sorry, maintaining this in my spare time so I didn't have to to check this. Are you using mono on OSX/Linux or .NET on Windows?

xivk avatar Aug 02 '16 09:08 xivk

No worries... We are all busy people... I understand... You seem to have written a good library here... Happy to contribute... I'm utilising .Net on Windows...

ukadiyala avatar Aug 02 '16 11:08 ukadiyala

NOT Fixed... Damn it!!! Still an issue. I have done a full circle and come back to the beginning...

ukadiyala avatar Aug 05 '16 19:08 ukadiyala

As I said, doing this in my spare time and it can take a while... you an always submit a pull request I can create a new build/package for you...

xivk avatar Aug 05 '16 20:08 xivk

Just stumbled upon the same issue myself -- looks like CSVStreamReader doesn't correctly handle the case when the last item is in quotes.

https://github.com/itinero/GTFS/blob/develop/src/GTFS/IO/CSV/CSVStreamReader.cs#L155

Apparently, the last item is taken "as-is" (substring starting from the position after the last comma to the end) as opposed to the rest, which is correctly checked for quotes.

simon-meer avatar Oct 05 '17 21:10 simon-meer