CSV.jl
CSV.jl copied to clipboard
Bug when parsing complex CSV with multi-threading enabled
The problem is described in https://discourse.julialang.org/t/error-task-failed-exception-reading-csv/86544.
Most likely the cause of the problem is that the file has multi-line fields that are wrapped in "
.
The dataset includes a long text in which the "
character is escaped by a \
. Using escapechar='\\'
in the options solves this issue, so I think it's not a bug, maybe just a discoverability issue with that option?
I have checked this and using escapechar='\\'
does not solve the issue. Also note that single threaded the file is read correctly just when using df = CSV.read("DataEngineer.csv", DataFrame, ntasks=1)
.
Also I have checked that indeed there is an issue with embedded "
characters, but an example of such situation is:
"Job Description
<here I cut out irrelevant multi-line input>
applicable state and local \""Fair Chance\"" laws."
and setting escapechar='\\'
leads to errors. The default setting escapecha='"'
seems correct as then it just gets parsed as local \"Fair Chance\" laws
which is maybe not ideal, but at least correctly respects the field delimiter.
My bad, you are absolutely right, escapechar='\\'
is simply wrong here. Apologies for the noise!