julia Add a set-up step before `iterate`

Currently, when one writes a loop over an iterator, it gets lowered from

for i in v
    ...
end

to the equivalent of

res = iterate(v)
while !isnothing(res)
    i, state = res
    ...
    res = iterate(v, state)
end

I wonder if there'd be any potential problems with generalizing this a bit by adding a function Base.to_iterator (or whatever) with a default method that is just identity, and then we could lower the loop down to

itr = Base.to_iterator(v) # <------ Setup step, default method just returns v
res = iterate(itr)
while !isnothing(res)
    i, state = res
    ...
    res = iterate(itr, state)
end

This should not change anything about existing iterators, but I believe that for certain cases, this would afford us some interesting optimization opportunities, and make the implementation of some iterators easier.

Some iteration protocols are just easier (or more efficient) to express in terms of a mutable object, but it can be wasteful to have to carry around a mutable container in your type to hold state just in case someone wants to iterate over it. In this way, if it's set up and torn down automatically, there are opportunities for the compiler to stack allocate all the mutable containers.

An example usecase:

Suppose I have the type

struct VectorOfVectors{T} <:AbstractVector{T}
    data::Vector{Vector{T}}
end

which I want to basically treat as a flat vector semantically.

Implementing iterate on this type is pretty painful, it requires juggling an inner state and an outer state for the outer vectors and the inner vectors, and trying to do this has a large chance of being wrong, or suboptimal. It's especially frustrating to write out this iterator, because Base.Iterators has already provided the wanted iteration protocol for this type: Iterators.flatten.

If we had a setup step like the one proposed, someone could just write

Base.to_iterator(vov::VectorOfVectors) = Iterators.flatten(vov)

and be done with it. Currently, it seems (thanks @vtjnash for pointing out this is possible) that the current best way to re-use the work already done in base would be to write something like this:

function Base.iterate(vov::VectorOfVectors)
    itr = Iterators.flatten(vov.data)
    res = iterate(itr)
    isnothing(res) && return nothing
    x, state = res
    return x, (itr, state)
end

function Base.iterate(vov::VectorOfVectors, (itr, state))
    res = iterate(itr, state)
    isnothing(res) && return nothing
    x, state = res
    return x, (itr, state)
end

which is considerably more work, and is just a bunch of pointless boiler-plate just to forward an iteration specification to an existing iterator.

Sep 16 '22 16:09 MasonProtter

As pointed out in Mike Innes' post Iterate on It, this also allows you to create a destructive iterator, then GC the original object.

Sep 16 '22 17:09 jakobnissen

It is unnecessary, since if you have a stateful iterator, you just return that as the state when making the first iteration step

Sep 16 '22 19:09 vtjnash

Hm, that would make my original use-case somewhat unnecessary (although significantly uglier to implement), but it wouldn't fix the use-case from Mike's blog post (thanks @jakobnissen I had forgotten about this!).

If I used your trick @vtjnash with the state, there would still be a reference to the linked list throughout the whole loop, until it actually terminates, whereas if you use to_iterator to create a destructive stateful iterator, then v can be gc'd if it isn't used anymore after the loop.

Sep 16 '22 20:09 MasonProtter

there is no reference if iterator does not use the argument

Sep 16 '22 21:09 vtjnash

Does that rely on iterate inlining?

Sep 18 '22 18:09 MasonProtter

Isn't this sort of going back to the old iteration protocol..?

Sep 19 '22 21:09 Seelengrab

Not really, no. It's just an optional hook that lets you choose what object you actually iterate over.

Sep 19 '22 22:09 MasonProtter

GC'ing appears to rely on inlining. In the following example:

mutable struct Foo
    x::Int

    function Foo(x::Int)
        y = new(x)
        println("Creating $(repr(objectid(y)))")
        finalizer(i -> println("Destroying $(repr(objectid(i)))"), y)
    end
end

Base.length(x::Foo) = x.x
Base.iterate(x::Foo) = (x.x, x.x-1)

@noinline function Base.iterate(x::Foo, state)
    iszero(state) && return nothing
    GC.gc()
    (state, state-1)
end

f(x) = foreach(println, Foo(x))

No GC is performed during the loop t.f(5), but it is when @noinline is removed. Further, it will not GC if Base.iterate is written as a single method with a default state argument, as is idiomatic.

IMO, if it was possible to GC in the un-inlined case, then Jameson is correct and there would be no need for a setup function.

Sep 20 '22 07:09 jakobnissen

Just wanted to point out that this was discussed at the time, so there's some extensive discussion here: https://github.com/JuliaLang/julia/pull/25261#issuecomment-353728585. In general, I'm not necessarily opposed to it, but I'm not sure the justification is all that strong, since I don't think it would let you implement anything that couldn't be implemented right now, while simultaneously imposing an extra method resolution on every single loop in the system (which isn't terrible, but should have some reason for it).

Sep 21 '22 06:09 Keno

GC'ing appears to rely on inlining. In the following example:

~~I'm running this on 1.8.2 and I do see the GC run in this example, even with @inline and a default argument, so I guess that solves it.~~

Edit: Nvm, I think I was doing something wrong.

Oct 20 '22 21:10 MasonProtter

Fixed I guess with SetupIterator.jl

mutable struct Foo
    x::Int

    function Foo(x::Int)
        y = new(x)
        println("Creating $(repr(objectid(y)))")
        finalizer(i -> println("Destroying $(repr(objectid(i)))"), y)
    end
end
Base.length(x::Foo) = x.x

using SetupIterator
SetupIterator.iterator(f::Foo) = f.x:-1:1
@setup_iterate Foo

julia> function f(n)
           for x ∈ Foo(n)
               GC.gc()
               @show x
           end
       end
f (generic function with 1 method)

julia> f(5)
Creating 0x7dbd75374892a811
Destroying 0x7dbd75374892a811
x = 5
x = 4
x = 3
x = 2
x = 1

Oct 23 '22 21:10 MasonProtter

julia julia copied to clipboard

Add a set-up step before `iterate`

julia
julia copied to clipboard