CSV.jl icon indicating copy to clipboard operation
CSV.jl copied to clipboard

CSV.read error with limit on multiple threads

Open bkamins opened this issue 3 years ago • 3 comments

This is run on 8 threads on a large file:

julia> describe(CSV.read("instagram_locations.csv", DataFrame, limit=1000), :eltype)
ERROR: TaskFailedException

    nested task error: BoundsError: attempt to access 1000-element Vector{UInt32} at index [1001]
    Stacktrace:
     [1] setindex!
       @ .\array.jl:966 [inlined]
     [2] checkpooled!(#unused#::Type{Union{Missing, String31}}, pertaskcolumns::Vector{Vector{CSV.Column}}, col::CSV.Column, j::Int64, ntasks::Int64, nrows::Int64, ctx::CSV.Context)
       @ CSV ~\.julia\packages\CSV\1P1tQ\src\file.jl:513
     [3] multithreadpostparse(ctx::CSV.Context, ntasks::Int64, pertaskcolumns::Vector{Vector{CSV.Column}}, rows::Vector{Int64}, finalrows::Int64, j::Int64, col::CSV.Column)
       @ CSV ~\.julia\packages\CSV\1P1tQ\src\file.jl:432
     [4] macro expansion
       @ ~\.julia\packages\WorkerUtilities\ey0fP\src\WorkerUtilities.jl:384 [inlined]
     [5] (::CSV.var"#31#36"{CSV.Context, Int64, Vector{Vector{CSV.Column}}, Vector{Int64}, Int64, Int64, CSV.Column})()
       @ CSV .\threadingconstructs.jl:258
Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base .\task.jl:436
 [2] macro expansion
   @ .\task.jl:455 [inlined]
 [3] CSV.File(ctx::CSV.Context, chunking::Bool)
   @ CSV ~\.julia\packages\CSV\1P1tQ\src\file.jl:281
 [4] File
   @ ~\.julia\packages\CSV\1P1tQ\src\file.jl:226 [inlined]
 [5] #File#28
   @ ~\.julia\packages\CSV\1P1tQ\src\file.jl:222 [inlined]
 [6] read(source::String, sink::Type; copycols::Bool, kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:limit,), Tuple{Int64}}})
   @ ~\.julia\packages\CSV\1P1tQ\src\CSV.jl:117
 [7] top-level scope
   @ REPL[10]:1

bkamins avatar Dec 23 '22 07:12 bkamins

Bump, I'm seeing this bug too on v0.10.10. Any workarounds would be helpful too.

ntasks=1 works but it's slow.

jariji avatar May 20 '23 01:05 jariji

Can either of you try on latest main branch? We just merged a related fix.

quinnj avatar May 20 '23 02:05 quinnj

No luck here. limit = 100_000 gives

nested task error: BoundsError: attempt to access 100000-element Vector{UInt32} at index [100001]

in the same place as shown in the OP.

jariji avatar May 20 '23 03:05 jariji