Custom iterator breaks on last row of a readdlm-loaded array
When using a custom iterator over an array created with readdlm, the last row is broken in Julia 1.0. Yes, I did test this with no warnings in 0.7. This behavior looks like a bug to me and possibly related to JuliaLang/julia#28763
using DelimitedFiles # for readdlm
mutable struct mytbl # a custom object
source
function mytbl(csvdata::Base.GenericIOBuffer)
source = readdlm(csvdata, ',')
source = convert(Array, source[2:end,:]) # clear the headers
new(source)
end
end
Base.length(it::mytbl) = size(it.source, 1)
function Base.iterate(it::mytbl, (el, i)=(it.source[1,:], 1))
return i >= length(it) ? nothing : (el, (it.source[i + 1,:], i + 1))
end
# some sample data
TABLE_CAST = """id,height,age,name,occupation
1,10.0,1,string1,2012-06-15 00:00:00
2,10.1,2,string2,2013-06-15 01:00:00
"""
table = mytbl(IOBuffer(TABLE_CAST))
[ row for row in table ]
# Returns: last element is #undef
# Expected: the last element is the second row
[ row[1] for row in table ]
# Returns:
[1, 239486056]
# Expected: [1, 2]
[ row[2] for row in table ]
# Returns:
[10.0, 1.18e-315]
# Expected: [10.0, 10.1]
Note that I've tested this with more elements, the last row is always..well, broken.
Further discussion and a workaround in https://discourse.julialang.org/t/last-row-is-broken-when-iterating-an-array/15696
Isn't this just a faulty iterate definition? It says that length should be equal to size(it.source, 1) but it returns nothing after size(it.source, 1)-1. Using something like
function Base.iterate(it::mytbl, i=1)
return i > length(it) ? nothing : (it.source[i, :], (i+1))
end
we get
julia> [ row for row in table ]
2-element Array{Array{Any,1},1}:
[1, 10.0, 1, "string1", "2012-06-15 00:00:00"]
[2, 10.1, 2, "string2", "2013-06-15 01:00:00"]
julia> [ row[1] for row in table ]
2-element Array{Int64,1}:
1
2
julia> [ row[2] for row in table ]
2-element Array{Float64,1}:
10.0
10.1
There I suppose there's two things going on here: instead of nothing the last item returned is a weird undefined row with strange values when accessed by index. And the documentation on iterators could use some better examples, like yours @KristofferC
Maybe the array comprehension code could check that the number of collected elements matches the declared length for HasLength/HasShape iterators? That would catch errors like this.
Yes, this is very similar to JuliaLang/julia#28763. Adding a check for iterating fewer than length elements sounds good; that at least fixes the problem in one direction.
https://github.com/JuliaLang/julia/pull/29458 catches this:
julia> [ row for row in table ]
ERROR: iterator returned fewer elements than its declared length
Stacktrace:
[1] collect_to!(::Array{Array{Any,1},1}, ::Base.Generator{mytbl,getfield(Main, Symbol("##3#4"))}, ::Int64, ::Tuple{Array{Any,1},Int64}) at ./array.jl:690
[2] collect_to_with_first!(::Array{Array{Any,1},1}, ::Array{Any,1}, ::Base.Generator{mytbl,getfield(Main, Symbol("##3#4"))}, ::Tuple{Array{Any,1},Int64}) at ./array.jl:663
[3] collect(::Base.Generator{mytbl,getfield(Main, Symbol("##3#4"))}) at ./array.jl:644
[4] top-level scope at none:0
julia> [ row[1] for row in table ]
ERROR: iterator returned fewer elements than its declared length
Stacktrace:
[1] collect_to!(::Array{Int64,1}, ::Base.Generator{mytbl,getfield(Main, Symbol("##5#6"))}, ::Int64, ::Tuple{Array{Any,1},Int64}) at ./array.jl:690
[2] collect_to_with_first!(::Array{Int64,1}, ::Int64, ::Base.Generator{mytbl,getfield(Main, Symbol("##5#6"))}, ::Tuple{Array{Any,1},Int64}) at./array.jl:663
[3] collect(::Base.Generator{mytbl,getfield(Main, Symbol("##5#6"))}) at ./array.jl:644
[4] top-level scope at none:0