DelimitedFiles.jl icon indicating copy to clipboard operation
DelimitedFiles.jl copied to clipboard

Custom iterator breaks on last row of a readdlm-loaded array

Open loleg opened this issue 7 years ago • 5 comments

When using a custom iterator over an array created with readdlm, the last row is broken in Julia 1.0. Yes, I did test this with no warnings in 0.7. This behavior looks like a bug to me and possibly related to JuliaLang/julia#28763

using DelimitedFiles # for readdlm

mutable struct mytbl # a custom object
    source
    function mytbl(csvdata::Base.GenericIOBuffer)
        source = readdlm(csvdata, ',')
        source = convert(Array, source[2:end,:]) # clear the headers
        new(source)
    end
end

Base.length(it::mytbl) = size(it.source, 1)

function Base.iterate(it::mytbl, (el, i)=(it.source[1,:], 1))
   return i >= length(it) ? nothing : (el, (it.source[i + 1,:], i + 1))
end

# some sample data
TABLE_CAST = """id,height,age,name,occupation
1,10.0,1,string1,2012-06-15 00:00:00
2,10.1,2,string2,2013-06-15 01:00:00
"""

table =  mytbl(IOBuffer(TABLE_CAST))

[ row for row in table ]
# Returns: last element is #undef
# Expected: the last element is the second row

[ row[1] for row in table ]
# Returns:
[1, 239486056]
# Expected: [1, 2]

[ row[2] for row in table ]
# Returns:
[10.0, 1.18e-315]
# Expected: [10.0, 10.1]

Note that I've tested this with more elements, the last row is always..well, broken.

Further discussion and a workaround in https://discourse.julialang.org/t/last-row-is-broken-when-iterating-an-array/15696

loleg avatar Sep 30 '18 16:09 loleg

Isn't this just a faulty iterate definition? It says that length should be equal to size(it.source, 1) but it returns nothing after size(it.source, 1)-1. Using something like

function Base.iterate(it::mytbl, i=1)
    return i > length(it) ? nothing : (it.source[i, :], (i+1))
end

we get

julia> [ row for row in table ]
2-element Array{Array{Any,1},1}:
 [1, 10.0, 1, "string1", "2012-06-15 00:00:00"]
 [2, 10.1, 2, "string2", "2013-06-15 01:00:00"]

julia> [ row[1] for row in table ]
2-element Array{Int64,1}:
 1
 2

julia> [ row[2] for row in table ]
2-element Array{Float64,1}:
 10.0
 10.1

KristofferC avatar Sep 30 '18 16:09 KristofferC

There I suppose there's two things going on here: instead of nothing the last item returned is a weird undefined row with strange values when accessed by index. And the documentation on iterators could use some better examples, like yours @KristofferC

loleg avatar Sep 30 '18 19:09 loleg

Maybe the array comprehension code could check that the number of collected elements matches the declared length for HasLength/HasShape iterators? That would catch errors like this.

nalimilan avatar Sep 30 '18 19:09 nalimilan

Yes, this is very similar to JuliaLang/julia#28763. Adding a check for iterating fewer than length elements sounds good; that at least fixes the problem in one direction.

JeffBezanson avatar Oct 01 '18 16:10 JeffBezanson

https://github.com/JuliaLang/julia/pull/29458 catches this:

julia> [ row for row in table ]
ERROR: iterator returned fewer elements than its declared length
Stacktrace:
 [1] collect_to!(::Array{Array{Any,1},1}, ::Base.Generator{mytbl,getfield(Main, Symbol("##3#4"))}, ::Int64, ::Tuple{Array{Any,1},Int64}) at ./array.jl:690
 [2] collect_to_with_first!(::Array{Array{Any,1},1}, ::Array{Any,1}, ::Base.Generator{mytbl,getfield(Main, Symbol("##3#4"))}, ::Tuple{Array{Any,1},Int64}) at ./array.jl:663
 [3] collect(::Base.Generator{mytbl,getfield(Main, Symbol("##3#4"))}) at ./array.jl:644
 [4] top-level scope at none:0

julia> [ row[1] for row in table ]
ERROR: iterator returned fewer elements than its declared length
Stacktrace:
 [1] collect_to!(::Array{Int64,1}, ::Base.Generator{mytbl,getfield(Main, Symbol("##5#6"))}, ::Int64, ::Tuple{Array{Any,1},Int64}) at ./array.jl:690
 [2] collect_to_with_first!(::Array{Int64,1}, ::Int64, ::Base.Generator{mytbl,getfield(Main, Symbol("##5#6"))}, ::Tuple{Array{Any,1},Int64}) at./array.jl:663
 [3] collect(::Base.Generator{mytbl,getfield(Main, Symbol("##5#6"))}) at ./array.jl:644
 [4] top-level scope at none:0

nalimilan avatar Oct 01 '18 19:10 nalimilan