CSV.jl icon indicating copy to clipboard operation
CSV.jl copied to clipboard

Unable to parse really long CSV cell (breaks Parsers.jl)

Open hs-ye opened this issue 4 years ago • 4 comments

Hi Team,

Have an unusual situation where i'm trying to read in CSV files with really long (Geospatial data), about ~150k characters per row (Sample attached).

Using the default CSV.File method with quote chars (see my sample file attached) - i get this error. Following the stacktrace it seems the problem is with how Parsers.jl implements reading long strings from a file using their custom byte index, which only supports a maximum length of ~100k chars

segment_mini.csv

Error stacktrace:

csv

Wondering what's the stance on supporting this type of use case by CSV.jl? Will there ever be support for super long lines or should I raise with over at the Parsers.jl github instead?

hs-ye avatar Oct 21 '21 12:10 hs-ye

Just a clarification that the ~100K characters is per cell, not per row. I think we can support double the current length without too much trouble; we just need to add the bigger definition in Parsers.jl, then need to provide a way in CSV.jl, probably just via a keyword arg, to specify that you need/want the larger PosLen.

quinnj avatar Oct 21 '21 16:10 quinnj

Sorry yes, per Cell is correct, a limitation of the current PosLen primitive used for strings. In the data It's just the one column geoj_segment that's a polygon of GPS co-ordinates, which could be really long (the first data row in the sample i provided).

Double the length would be amazing, i think the 150k is the largest cell we have right now. I'm also looking at compressing/truncating the data from my end to solve my immediate problem, but if this could be a future feature it would help a lot!

hs-ye avatar Oct 21 '21 21:10 hs-ye

Some thoughts/initial work at increasing capacity in Parsers.jl: https://github.com/JuliaData/Parsers.jl/pull/98

quinnj avatar Oct 22 '21 06:10 quinnj

Bumping this issue, since I'm running into a similar problem, and it appears there's no alternative parsing option in this case.

brad-ross avatar Apr 27 '23 03:04 brad-ross