julia
julia copied to clipboard
Add a set-up step before `iterate`
Currently, when one writes a loop over an iterator, it gets lowered from
for i in v
...
end
to the equivalent of
res = iterate(v)
while !isnothing(res)
i, state = res
...
res = iterate(v, state)
end
I wonder if there'd be any potential problems with generalizing this a bit by adding a function Base.to_iterator (or whatever) with a default method that is just identity, and then we could lower the loop down to
itr = Base.to_iterator(v) # <------ Setup step, default method just returns v
res = iterate(itr)
while !isnothing(res)
i, state = res
...
res = iterate(itr, state)
end
This should not change anything about existing iterators, but I believe that for certain cases, this would afford us some interesting optimization opportunities, and make the implementation of some iterators easier.
Some iteration protocols are just easier (or more efficient) to express in terms of a mutable object, but it can be wasteful to have to carry around a mutable container in your type to hold state just in case someone wants to iterate over it. In this way, if it's set up and torn down automatically, there are opportunities for the compiler to stack allocate all the mutable containers.
An example usecase:
Suppose I have the type
struct VectorOfVectors{T} <:AbstractVector{T}
data::Vector{Vector{T}}
end
which I want to basically treat as a flat vector semantically.
Implementing iterate on this type is pretty painful, it requires juggling an inner state and an outer state for the outer vectors and the inner vectors, and trying to do this has a large chance of being wrong, or suboptimal. It's especially frustrating to write out this iterator, because Base.Iterators has already provided the wanted iteration protocol for this type: Iterators.flatten.
If we had a setup step like the one proposed, someone could just write
Base.to_iterator(vov::VectorOfVectors) = Iterators.flatten(vov)
and be done with it. Currently, it seems (thanks @vtjnash for pointing out this is possible) that the current best way to re-use the work already done in base would be to write something like this:
function Base.iterate(vov::VectorOfVectors)
itr = Iterators.flatten(vov.data)
res = iterate(itr)
isnothing(res) && return nothing
x, state = res
return x, (itr, state)
end
function Base.iterate(vov::VectorOfVectors, (itr, state))
res = iterate(itr, state)
isnothing(res) && return nothing
x, state = res
return x, (itr, state)
end
which is considerably more work, and is just a bunch of pointless boiler-plate just to forward an iteration specification to an existing iterator.
As pointed out in Mike Innes' post Iterate on It, this also allows you to create a destructive iterator, then GC the original object.
It is unnecessary, since if you have a stateful iterator, you just return that as the state when making the first iteration step
Hm, that would make my original use-case somewhat unnecessary (although significantly uglier to implement), but it wouldn't fix the use-case from Mike's blog post (thanks @jakobnissen I had forgotten about this!).
If I used your trick @vtjnash with the state, there would still be a reference to the linked list throughout the whole loop, until it actually terminates, whereas if you use to_iterator to create a destructive stateful iterator, then v can be gc'd if it isn't used anymore after the loop.
there is no reference if iterator does not use the argument
Does that rely on iterate inlining?
Isn't this sort of going back to the old iteration protocol..?
Not really, no. It's just an optional hook that lets you choose what object you actually iterate over.
GC'ing appears to rely on inlining. In the following example:
mutable struct Foo
x::Int
function Foo(x::Int)
y = new(x)
println("Creating $(repr(objectid(y)))")
finalizer(i -> println("Destroying $(repr(objectid(i)))"), y)
end
end
Base.length(x::Foo) = x.x
Base.iterate(x::Foo) = (x.x, x.x-1)
@noinline function Base.iterate(x::Foo, state)
iszero(state) && return nothing
GC.gc()
(state, state-1)
end
f(x) = foreach(println, Foo(x))
No GC is performed during the loop t.f(5), but it is when @noinline is removed. Further, it will not GC if Base.iterate is written as a single method with a default state argument, as is idiomatic.
IMO, if it was possible to GC in the un-inlined case, then Jameson is correct and there would be no need for a setup function.
Just wanted to point out that this was discussed at the time, so there's some extensive discussion here: https://github.com/JuliaLang/julia/pull/25261#issuecomment-353728585. In general, I'm not necessarily opposed to it, but I'm not sure the justification is all that strong, since I don't think it would let you implement anything that couldn't be implemented right now, while simultaneously imposing an extra method resolution on every single loop in the system (which isn't terrible, but should have some reason for it).
GC'ing appears to rely on inlining. In the following example:
~~I'm running this on 1.8.2 and I do see the GC run in this example, even with @inline and a default argument, so I guess that solves it.~~
Edit: Nvm, I think I was doing something wrong.
Fixed I guess with SetupIterator.jl
mutable struct Foo
x::Int
function Foo(x::Int)
y = new(x)
println("Creating $(repr(objectid(y)))")
finalizer(i -> println("Destroying $(repr(objectid(i)))"), y)
end
end
Base.length(x::Foo) = x.x
using SetupIterator
SetupIterator.iterator(f::Foo) = f.x:-1:1
@setup_iterate Foo
julia> function f(n)
for x ∈ Foo(n)
GC.gc()
@show x
end
end
f (generic function with 1 method)
julia> f(5)
Creating 0x7dbd75374892a811
Destroying 0x7dbd75374892a811
x = 5
x = 4
x = 3
x = 2
x = 1