GTFS
GTFS copied to clipboard
Parsing a feed with quotes...
Hi There,
I'm trying to parse a feed which utilises quotes ("") in addition to commas in every file. Would you have an example of how I could configure the reader to discard the quotes?
All files seem to be parsing except the calendar file. Here is a sample of what it looks like:
service_id,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date FULLW,1,1,1,1,1,1,1,20160714,20161014 WE,0,0,0,0,0,1,1,20160714,20161014 "Z1+1","1","1","1","1","1","1","1","20160714","20161014"
The first two lines parse as they are from the test file. The last line which is from the feed does not and I get an error message saying: "Could not parse value "20161014" in field end_date in file calendar.".
I'm pretty sure, there is a configuration item I'm missing. This also makes me wonder about the rest of the data. Can you please help?
Regards, Udhay
Can you provide a sample feed or build a unittest that simulates this? Will make it a lot easier to track down this issue...
Hi There... Thank you for the prompt response... Attached is a sample .zip extract from a wider set I'm working with... I also noticed the same problem with the calendar_dates file...
Thanks and look forward to hearing from you soon...
Hi there... I'm hoping that you have managed to open the sample files and reproduce the issue I'm facing... any recommendations from your end?
Hi There,
Hope all is well on your end. I have not heard back from you. So, I had taken the liberty to replicate your source code on my machine and stepped through it.
The problem seems to in the MoveNext() method of the CSVStreamReader class. Upon looking further into it, the 'line' variable carrying the new line seems to have an additional '' after the ".
I was wondering if I could configure it to use a line pre-processor to solve this problem. Your thoughts?
Regards, Udhay
Done... Solved it with a line pre-processor delegate...
used the code below to configure it... reader.LinePreprocessor = delegate (string s) { return s.Replace(""", ""); };
Also, was wondering if you have any samples handy of the invalid feeds you are accounting for in the MoveNext() method... I wonder if looping around might be the best thing to do in a portable class library...
Do let me know...
Sorry, maintaining this in my spare time so I didn't have to to check this. Are you using mono on OSX/Linux or .NET on Windows?
No worries... We are all busy people... I understand... You seem to have written a good library here... Happy to contribute... I'm utilising .Net on Windows...
NOT Fixed... Damn it!!! Still an issue. I have done a full circle and come back to the beginning...
As I said, doing this in my spare time and it can take a while... you an always submit a pull request I can create a new build/package for you...
Just stumbled upon the same issue myself -- looks like CSVStreamReader
doesn't correctly handle the case when the last item is in quotes.
https://github.com/itinero/GTFS/blob/develop/src/GTFS/IO/CSV/CSVStreamReader.cs#L155
Apparently, the last item is taken "as-is" (substring starting from the position after the last comma to the end) as opposed to the rest, which is correctly checked for quotes.