Create `infer_eltype` function
Addresses #54157
This PR adds a function which can be useful when one wants to infer the eltype of a collection with only a loose eltype but which could be made tighter by using type inference, for example:
julia> struct A end
julia> struct B end
julia> itr = (x < 5 ? A() : B() for x in 1:10);
julia> eltype(itr)
Any
julia> infer_eltype(itr)
Union{A, B}
There is an older discussion about exposing Julia's type inference as API with a mechanism like Base.result_type. This overlaps substantially
I think you are referring to https://github.com/JuliaLang/julia/issues/54207, right? I would love to see an user-facing type inference API for sure, at the same time I think a function like this one could be also part of that API so maybe a PR which addresses just that could be accepted nonetheless (maybe without exporting the function at first)
I really regret having any form of return_type or default_eltype etc. but here we are. Putting that aside, I think the biggest problem with this function is knowing when to use it instead of just eltype. The standard design would be for an iterator to decide how its eltype is computed, not the user. If it's ok to use inference to determine the eltype, then iterators where that makes sense should just do it. Generator is of course an example of that since it calls an arbitrary function. Another possible design is to have the fallback eltype method call this for iterators that are EltypeUnknown.
To elaborate on Jeff's point: adding a new function that is like eltype but gives a different answer is not great—when are you supposed to use eltype versus infer_eltype? It's more possible to change what eltype returns in certain cases. One case where it might be quite helpful is to use inference to give a tighter eltype than Any for generators. Another case might be to change the default eltype in general to use inference. Of course, such changes to the behaviors of eltype would be have to well justified and thought through.
I also considered to use inference by default, but some comments (cc. @nsajko) made me realize that inference is not totally reliable and so maybe having the user to acknowledge that with a new function like infer_eltype would have been good...But here they come my (probably basic) questions: how much is inference unreliable? Can it happen that the same call returns Any on a machine and the concrete type on another? What about different calls at different places in the code? When will this happen? Can it be considered "rare enough"?
how much is inference unreliable?
See, e.g., #35800, #45388, #50735.
The problem is not it being "unreliable", but that (1) we want to be able to change it between versions (e.g. to trade off precision and compile times), and (2) it's nice to be able to run programs in an interpreter and get the same results. The results also depend on arbitrary heuristics like how many Union elements we want to allow.
From traige:
Nice job on the implementation, but from a language design perspective, we want to move away from relying on inference for semantics.
Nice job on the implementation
thanks!
I hope there is then a solution which doesn't involve inference. Is there?
There is are solutions which don't involve inference. Though they don't infer the type for you and instead you have to explicitly declare it. For example:
julia> struct EltypeIter{T,P}
parent::P
end
julia> EltypeIter{T}(x) where T = EltypeIter{T, typeof(x)}(x)
julia> Base.eltype(::EltypeIter{T}) where T = T
julia> Base.iterate(x::EltypeIter, i...) = iterate(x.parent, i...)
julia> Base.IteratorSize(::EltypeIter{T, P}) where {T, P} = Base.IteratorSize(P)
julia> Base.length(x::EltypeIter) = length(x.parent)
julia> Base.size(x::EltypeIter) = size(x.parent)
julia> struct A end
julia> struct B end
julia> itr = (x < 5 ? A() : B() for x in 1:10);
julia> eltype(itr)
Any
julia> itr2 = EltypeIter{Union{A, B}}(itr)
EltypeIter{Union{A, B}, Base.Generator{UnitRange{Int64}, var"#5#6"}}(Base.Generator{UnitRange{Int64}, var"#5#6"}(var"#5#6"(), 1:10))
julia> eltype(itr2)
Union{A, B}
julia> collect(itr)
10-element Vector{Any}:
A()
A()
A()
A()
B()
B()
B()
B()
B()
B()
julia> collect(itr2)
10-element Vector{Union{A, B}}:
A()
A()
A()
A()
B()
B()
B()
B()
B()
B()
I'm becoming radicalized against eltype, too. It can be useful to overlay certain optimizations, but it really cannot be used to define the fundamental behaviors. Yes, dealing with values purely as you get them can be hard, but trying to do something smart based on eltype can easily lead you to a definitively wrong answer.
In other words, instead of trying to get a smarter eltype, we need to have smoother idioms for incremental widening based upon the values alone.