CSV.jl
CSV.jl copied to clipboard
CSV.jl fails to parse a file that DuckDB is fine with
MWE:
import CSV, QuackIO
using DataFrames
file = download("https://raw.githubusercontent.com/newzealandpaul/Maritime-Pirate-Attacks/refs/heads/main/data/csv/pirate_attacks.csv")
# try QuackIO first
dataset = QuackIO.read_csv(DataFrame, file) # works
# now try CSV
CSV.read(file, DataFrame) # errors
The error:
ERROR: TaskFailedException
nested task error: thread = 7 fatal error, encountered an invalidly quoted field while parsing around row = 4573, col = 12: ""03.10.2018: 2330 UTC: Posn: 38:49.2N – 118:14.5E, Tianjin Anchorage, China.
", error=INVALID: OK | QUOTED | EOF | INVALID_QUOTED_FIELD , check your `quotechar` arguments or manually fix the field in the file itself
Stacktrace:
[1] fatalerror(buf::Vector{UInt8}, pos::Int64, len::Int64, code::Int16, row::Int64, col::Int64)
@ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:590
[2] parsevalue!(::Type{…}, buf::Vector{…}, pos::Int64, len::Int64, row::Int64, rowoffset::Int64, i::Int64, col::CSV.Column, ctx::CSV.Context)
@ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:798
[3] parserow
@ ~/.julia/packages/CSV/cwX2w/src/file.jl:640 [inlined]
[4] parsefilechunk!(ctx::CSV.Context, pos::Int64, len::Int64, rowsguess::Int64, rowoffset::Int64, columns::Vector{…}, ::Type{…})
@ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:550
[5] multithreadparse(ctx::CSV.Context, pertaskcolumns::Vector{…}, rowchunkguess::Int64, i::Int64, rows::Vector{…}, wholecolumnslock::ReentrantLock)
@ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:360
[6] (::CSV.var"#34#39"{CSV.Context, Vector{Vector{CSV.Column}}, Int64, Int64, Vector{Int64}, ReentrantLock})()
@ CSV ~/.julia/packages/WorkerUtilities/ey0fP/src/WorkerUtilities.jl:384
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:455
[2] macro expansion
@ ./task.jl:487 [inlined]
[3] CSV.File(ctx::CSV.Context, chunking::Bool)
@ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:240
[4] File
@ ~/.julia/packages/CSV/cwX2w/src/file.jl:227 [inlined]
[5] #File#32
@ ~/.julia/packages/CSV/cwX2w/src/file.jl:223 [inlined]
[6] CSV.File(source::String)
@ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:162
[7] read(source::String, sink::Type; copycols::Bool, kwargs::@Kwargs{})
@ CSV ~/.julia/packages/CSV/cwX2w/src/CSV.jl:117
[8] read(source::String, sink::Type)
@ CSV ~/.julia/packages/CSV/cwX2w/src/CSV.jl:113
[9] top-level scope
@ REPL[223]:1
Some type information was truncated. Use `show(err)` to see complete types.
I tried tracking down the error, but everything in that area of the file (both the line mentioned and searching for the given text) seemed fine...
Hi @asinghvi17
I ran the code above and did not find any errors, the final output was the DF itself. perhaps this is an issue which is related to the installation of Julia?
This csv file seems to contain multi-line quoted fields. I think the underlying issue is therefore the same as that reported in #1139 and #1140. May also relate to #1157.