CSV.jl icon indicating copy to clipboard operation
CSV.jl copied to clipboard

CSV.jl fails to parse a file that DuckDB is fine with

Open asinghvi17 opened this issue 1 year ago • 2 comments

MWE:

import CSV, QuackIO
using DataFrames

file = download("https://raw.githubusercontent.com/newzealandpaul/Maritime-Pirate-Attacks/refs/heads/main/data/csv/pirate_attacks.csv")

# try QuackIO first
dataset = QuackIO.read_csv(DataFrame, file) # works

# now try CSV
CSV.read(file, DataFrame) # errors

The error:

ERROR: TaskFailedException

    nested task error: thread = 7 fatal error, encountered an invalidly quoted field while parsing around row = 4573, col = 12: ""03.10.2018: 2330 UTC: Posn: 38:49.2N – 118:14.5E, Tianjin Anchorage, China.
    ", error=INVALID: OK | QUOTED | EOF | INVALID_QUOTED_FIELD , check your `quotechar` arguments or manually fix the field in the file itself
    
    Stacktrace:
     [1] fatalerror(buf::Vector{UInt8}, pos::Int64, len::Int64, code::Int16, row::Int64, col::Int64)
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:590
     [2] parsevalue!(::Type{…}, buf::Vector{…}, pos::Int64, len::Int64, row::Int64, rowoffset::Int64, i::Int64, col::CSV.Column, ctx::CSV.Context)
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:798
     [3] parserow
       @ ~/.julia/packages/CSV/cwX2w/src/file.jl:640 [inlined]
     [4] parsefilechunk!(ctx::CSV.Context, pos::Int64, len::Int64, rowsguess::Int64, rowoffset::Int64, columns::Vector{…}, ::Type{…})
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:550
     [5] multithreadparse(ctx::CSV.Context, pertaskcolumns::Vector{…}, rowchunkguess::Int64, i::Int64, rows::Vector{…}, wholecolumnslock::ReentrantLock)
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:360
     [6] (::CSV.var"#34#39"{CSV.Context, Vector{Vector{CSV.Column}}, Int64, Int64, Vector{Int64}, ReentrantLock})()
       @ CSV ~/.julia/packages/WorkerUtilities/ey0fP/src/WorkerUtilities.jl:384
Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base ./task.jl:455
 [2] macro expansion
   @ ./task.jl:487 [inlined]
 [3] CSV.File(ctx::CSV.Context, chunking::Bool)
   @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:240
 [4] File
   @ ~/.julia/packages/CSV/cwX2w/src/file.jl:227 [inlined]
 [5] #File#32
   @ ~/.julia/packages/CSV/cwX2w/src/file.jl:223 [inlined]
 [6] CSV.File(source::String)
   @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:162
 [7] read(source::String, sink::Type; copycols::Bool, kwargs::@Kwargs{})
   @ CSV ~/.julia/packages/CSV/cwX2w/src/CSV.jl:117
 [8] read(source::String, sink::Type)
   @ CSV ~/.julia/packages/CSV/cwX2w/src/CSV.jl:113
 [9] top-level scope
   @ REPL[223]:1
Some type information was truncated. Use `show(err)` to see complete types.

I tried tracking down the error, but everything in that area of the file (both the line mentioned and searching for the given text) seemed fine...

asinghvi17 avatar Sep 30 '24 21:09 asinghvi17

Hi @asinghvi17

I ran the code above and did not find any errors, the final output was the DF itself. perhaps this is an issue which is related to the installation of Julia?

AmeroIL avatar Oct 20 '24 14:10 AmeroIL

This csv file seems to contain multi-line quoted fields. I think the underlying issue is therefore the same as that reported in #1139 and #1140. May also relate to #1157.

TimG1964 avatar May 16 '25 11:05 TimG1964