CSV.jl icon indicating copy to clipboard operation
CSV.jl copied to clipboard

Improve error message when IO source exhausted

Open tecosaur opened this issue 3 years ago • 4 comments

Hi, I'm using CSV.jl 0.10.4 and trying to read from an IOBuffer ... without much luck. See below for the MWE.

julia> source = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia> Downloads.download("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv", source)
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=3858, maxsize=Inf, ptr=3859, mark=-1)

julia> CSV.File(source)
ERROR: BoundsError: attempt to access 3858-element Vector{UInt8} at index [3859]
Stacktrace:
 [1] getindex
   @ ./array.jl:861 [inlined]
 [2] consumeBOM
   @ ~/.julia/packages/CSV/0Elut/src/utils.jl:246 [inlined]
 [3] CSV.Context(source::CSV.Arg, header::CSV.Arg, normalizenames::CSV.Arg, datarow::CSV.Arg, skipto::CSV.Arg, footerskip::CSV.Arg, transpose::CSV.Arg, comment::CSV.Arg, ignoreemptyrows::CSV.Arg, ignoreemptylines::CSV.Arg, select::CSV.Arg, drop::CSV.Arg, limit::CSV.Arg, buffer_in_memory::CSV.Arg, threaded::CSV.Arg, ntasks::CSV.Arg, tasks::CSV.Arg, rows_to_check::CSV.Arg, lines_to_check::CSV.Arg, missingstrings::CSV.Arg, missingstring::CSV.Arg, delim::CSV.Arg, ignorerepeated::CSV.Arg, quoted::CSV.Arg, quotechar::CSV.Arg, openquotechar::CSV.Arg, closequotechar::CSV.Arg, escapechar::CSV.Arg, dateformat::CSV.Arg, dateformats::CSV.Arg, decimal::CSV.Arg, truestrings::CSV.Arg, falsestrings::CSV.Arg, stripwhitespace::CSV.Arg, type::CSV.Arg, types::CSV.Arg, typemap::CSV.Arg, pool::CSV.Arg, downcast::CSV.Arg, lazystrings::CSV.Arg, stringtype::CSV.Arg, strict::CSV.Arg, silencewarnings::CSV.Arg, maxwarnings::CSV.Arg, debug::CSV.Arg, parsingdebug::CSV.Arg, validate::CSV.Arg, streaming::CSV.Arg)
   @ CSV ~/.julia/packages/CSV/0Elut/src/context.jl:309
 [4] #File#25
   @ ~/.julia/packages/CSV/0Elut/src/file.jl:221 [inlined]
 [5] CSV.File(source::IOBuffer)
   @ CSV ~/.julia/packages/CSV/0Elut/src/file.jl:221
 [6] top-level scope
   @ REPL[10]:1

tecosaur avatar Apr 23 '22 14:04 tecosaur

Parsing will start from the current position of the IOBiffer, so you probably need to call seekstart after downloading before reading. We can maybe have a better error here though.

quinnj avatar Apr 23 '22 14:04 quinnj

Ah, yep that's the issue. A better error message would certainly be helpful though :+1:.

tecosaur avatar Apr 23 '22 15:04 tecosaur

I just ran into this, because the docs explicitly mention the option to use io. In my case I don't think I save much time (ssds are fast). but we should definitely improve the error message. I have no understanding of buffers / position, but would it make sense to apply seekstart (within CSV.File) in certain cases ? e.g. if we see that the position is at the end when CSV.File is called (which is really not meaningful)

kafisatz avatar Oct 12 '22 07:10 kafisatz

I can't see how "if and end of IO, call seekstart" could hurt, that might be worth adding as a heuristic?

tecosaur avatar Oct 12 '22 07:10 tecosaur