CHCSVParser icon indicating copy to clipboard operation
CHCSVParser copied to clipboard

Fails to parse record with unescaped parenthesis

Open marvin-yorke opened this issue 10 years ago • 10 comments

given the record

16681;6;Orehovyj boulevard, ul. Musy Dzhalilja (odd side);20;out;55.6141571054;37.7460757208;800;34;34;0;0;0;0;0;1

library fails to parse the 3rd field with the following error:

Unexpected delimiter. Expected ';' (0x3B), but got '(' (0x28)

Is there any way to parse this data without altering it (e.g adding quotes)?

marvin-yorke avatar Mar 05 '15 08:03 marvin-yorke

Thanks for reporting this. I added a unit test to parse the exact text you provided, and it seems to have no problem with it. I tried parsing it as the in-memory string, and as a file written to disk (which is similar to what your code was doing). Both tests pass without modification to the parser, so I'm not sure what the issue is here.

davedelong avatar Mar 07 '15 17:03 davedelong

Are the URLs you're parsing remote (coming in over a network connection) or local file URLs? Do you have an example of either that I could try?

davedelong avatar Mar 07 '15 17:03 davedelong

Hi Dave, I'm downloading an archive from the server, unpack it into Documents directory and supply a URL to the file in Documents dir. You can find the files I'm using in the following archive: http://metro4all.org/data/msk.zip The file I've encountered the problem in is portals_ru.csv

marvin-yorke avatar Mar 07 '15 17:03 marvin-yorke

Thanks @marvin-yorke. I incorporated the portals.csv file into the unit tests, but they're still passing on my machine. 😕

davedelong avatar Mar 07 '15 18:03 davedelong

Hm, ok, I've cloned the repo and run the tests and it works on my machine too. I should have mentioned that original case was observed on iOS, not OS X. Could this make any difference? Also I've installed the library from Cocoapods, not from github, although there's no major difference to the latest code.. Anyway, I'll try again with my iOS app and let you know about the results

marvin-yorke avatar Mar 07 '15 18:03 marvin-yorke

I've checked the issue again and here's the line that breaks the parsing 17530;2;"Крокус Экспо" (павильон 1, 2);215;both;55.8235522598;37.3855503584;800;56;0;0;0;400;950;23;0 Turns out that it's not parentheses that cause the issue, but quotes. And now I'm not quite sure whether it's a parser problem or my data is malformed. What do you think?

marvin-yorke avatar Mar 08 '15 16:03 marvin-yorke

Yes, that is a problem with the data. When the parser encounters a field that starts with ", it assumes the field ends with the corresponding closing ". And then since the next character after the closing " isn't a delimiter (;), it aborts with an error.

davedelong avatar Mar 08 '15 16:03 davedelong

Then could you please help me on how to correct my data?

marvin-yorke avatar Mar 09 '15 06:03 marvin-yorke

The solutions seems pretty clear: don't start an field with quoted text; or if a field starts with quoted text wrap the whole field in quotes.

danieljfarrell avatar Mar 09 '15 09:03 danieljfarrell

Is there a property that can turn of such behavior. Or some work around without me having to edit the file I am parsing.

Edit: added one seems to work fine now :)

h3dkandi avatar Apr 27 '15 12:04 h3dkandi