Improve error message when IO source exhausted
Hi, I'm using CSV.jl 0.10.4 and trying to read from an IOBuffer ... without much luck. See below for the MWE.
julia> source = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)
julia> Downloads.download("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv", source)
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=3858, maxsize=Inf, ptr=3859, mark=-1)
julia> CSV.File(source)
ERROR: BoundsError: attempt to access 3858-element Vector{UInt8} at index [3859]
Stacktrace:
[1] getindex
@ ./array.jl:861 [inlined]
[2] consumeBOM
@ ~/.julia/packages/CSV/0Elut/src/utils.jl:246 [inlined]
[3] CSV.Context(source::CSV.Arg, header::CSV.Arg, normalizenames::CSV.Arg, datarow::CSV.Arg, skipto::CSV.Arg, footerskip::CSV.Arg, transpose::CSV.Arg, comment::CSV.Arg, ignoreemptyrows::CSV.Arg, ignoreemptylines::CSV.Arg, select::CSV.Arg, drop::CSV.Arg, limit::CSV.Arg, buffer_in_memory::CSV.Arg, threaded::CSV.Arg, ntasks::CSV.Arg, tasks::CSV.Arg, rows_to_check::CSV.Arg, lines_to_check::CSV.Arg, missingstrings::CSV.Arg, missingstring::CSV.Arg, delim::CSV.Arg, ignorerepeated::CSV.Arg, quoted::CSV.Arg, quotechar::CSV.Arg, openquotechar::CSV.Arg, closequotechar::CSV.Arg, escapechar::CSV.Arg, dateformat::CSV.Arg, dateformats::CSV.Arg, decimal::CSV.Arg, truestrings::CSV.Arg, falsestrings::CSV.Arg, stripwhitespace::CSV.Arg, type::CSV.Arg, types::CSV.Arg, typemap::CSV.Arg, pool::CSV.Arg, downcast::CSV.Arg, lazystrings::CSV.Arg, stringtype::CSV.Arg, strict::CSV.Arg, silencewarnings::CSV.Arg, maxwarnings::CSV.Arg, debug::CSV.Arg, parsingdebug::CSV.Arg, validate::CSV.Arg, streaming::CSV.Arg)
@ CSV ~/.julia/packages/CSV/0Elut/src/context.jl:309
[4] #File#25
@ ~/.julia/packages/CSV/0Elut/src/file.jl:221 [inlined]
[5] CSV.File(source::IOBuffer)
@ CSV ~/.julia/packages/CSV/0Elut/src/file.jl:221
[6] top-level scope
@ REPL[10]:1
Parsing will start from the current position of the IOBiffer, so you probably need to call seekstart after downloading before reading. We can maybe have a better error here though.
Ah, yep that's the issue. A better error message would certainly be helpful though :+1:.
I just ran into this, because the docs explicitly mention the option to use io. In my case I don't think I save much time (ssds are fast). but we should definitely improve the error message. I have no understanding of buffers / position, but would it make sense to apply seekstart (within CSV.File) in certain cases ? e.g. if we see that the position is at the end when CSV.File is called (which is really not meaningful)
I can't see how "if and end of IO, call seekstart" could hurt, that might be worth adding as a heuristic?