Dictionaries.jl
Dictionaries.jl copied to clipboard
Typocalypse-free mapping
Do you have a plan to remove all return_type from the API-visible part (including empty input)? Could you share your plan if you have one?
Also, I don't think it's possible to specify the eltype for Iterators.map(f, d::AbstractDictionary). I'd suggest using EltypeUnknown.
Excellent question @tkf.
Yes, I have a plan. Since I have found good success with SizeUnknown for Iterators.filter, I'd now like use EltypeUnknown for Iterators.map, as you say. :)
This will mean removing the I and T type parameters from AbstractDictionary{I, T}. These will become proper traits - IteratorElType andeltype for values, and, I dunno, keytype and ... IteratorEltype(keys(dict)) for keys? The concrete types can of course have type parameters. It's a non-trivial but doable amount of work. (If it works well I'd go as far as suggesting removing T from AbstractArray{T} for Julia 2.0, as that pesky T causes difficulties with lazy adjoints and other things already).
Honestly, I'm not 100% sure about I yet, either. It's not necessary for map since that modifies the values not the indices. But there's probably no point keeping it around.
(including empty input)
As for the empty collections... I have some thoughts but I'd really love to hear any ideas?
removing
TfromAbstractArray{T}for Julia 2.0
Oh my. I was thinking I was return_type-extremist :laughing:
This will mean removing the
IandTtype parameters fromAbstractDictionary{I, T}.
I was actually not thinking to remove these type parameters (even eltype). I was thinking something like
IteratorEltype(::Type{<:AbstractDictionary{<:Any, T}}) where {T} = HasEltype()
IteratorEltype(::Type{<:AbstractDictionary}) = EltypeUnknown()
IteratorEltype(::Type{<:AbstractDictionary{<:Any, nothing}}) = EltypeUnknown()
so that you'd use MappedDictionary{I, ...} <: AbstractDictionary{I, nothing} or something like that.
As for the empty collections... I have some thoughts but I'd really love to hear any ideas?
If you want to keep HasEltype behavior, I'd use AbstractDictionary{I, Union{}}. That's something I've been using in Transducers.jl/BangBang.jl. I'm interested in your thought on this, too.
Oh my. I was thinking I was
return_type-extremist :laughing:
:laughing: Yeah it was partly a joke. Honestly I'm an "extremist" in that we should either support/encourage it or remove it - the current middle ground seems painful. I personally think encouraging it would be better but I'm too far removed from the type inference implementation to have a valid opinion.
I was thinking something like
Hmm... it's interesting but I'm worried putting nothing there could potentially make life harder than removing it entirely. Dispatch could be messy. It's not clear if that type position is meant to be a type or a constant. Etc.
Regarding Union{} for empty collections I think this works out really well for "functional" style code and we use it in StaticArrays. For variable length collections you end up with things like Union{Indices{Union{}}. Indices{T}} coming out of inference distinct, etc. When the eltype is Union{T, Missing} you get Union{Indices{Union{}}. Indices{T}, Indices{Missing},Indices{Union{Missing,T}}} which is widened by inference (to Indices{<:Any} I believe?). So too slow for common use cases :(
I sometimes feel the language would work well if struct and Array were covariant and inference could simply widen the above to Indices{Union{Missing,T}} instead.
(Furthermore, for non-empty collections there's an unfortunate overhead in Union{} being treated as a box rather than a singleton. As in Vector{Union{}}(undef, 1_000_000) allocates 8 MB instead of a few bytes. I think this can be fixed.)
either support/encourage it or remove it - the current middle ground seems painful
Yeah, I agree it's painful ATM. But I think it is worth trying to make the current state as usable as possible. This would let us see the logical conclusion of the current API (which means to embrace mutate-or-widen all the way down, I guess). Maybe it'd prompt compiler guys to create a better return_type (e.g., rigorously define "inferable" subset of Julia and then provide tryinfer(f, types) :: Union{Some{<:Type}, Nothing}).
That said, I'm actually kind of OK with current API of return_type where you are only "allowed" to use for optimizations. Rather, "in return", I'd like to have more ways to force/control union split of the compiler (like tail-call elimination API).
but I'm worried putting
nothingthere could potentially make life harder than removing it entirely
Another more "down-to-earth" API might be AbstractDictionary{I, T, IEltype <: IteratorEltype, TEltype <: IteratorEltype}. But I just thought you might not want too many type parameters.
When the eltype is
Union{T, Missing}you getUnion{Indices{Union{}}. Indices{T}, Indices{Missing},Indices{Union{Missing,T}}}which is widened by inference (toIndices{<:Any}I believe?).
Isn't this solvable if you write "tail-call function-barrier" pattern?
So too slow for common use cases :(
If you have the typocalypse-free base implementation, it's always OK to use return_type-based optimized path, right? I guess the common cases are inferable ones?
if
structandArraywere covariant and inference could simply widen the above toIndices{Union{Missing,T}}instead
Hm... interesting. What would be the difference to union splitting in this approach? In particular, even with covariant Indices, don't f(::Indices{Union{Missing,T}}) still need to run f(::Indices{T}) at machine code level? Maybe it's more like pushing the union splitting to the callee side or something?
there's an unfortunate overhead in
Union{}
Yeah, it'd be nice to get rid of this. Meanwhile, I just created MicroCollections.UndefArray etc. as a workaround.
but I'm worried putting
nothingthere could potentially make life harder than removing it entirelyAnother more "down-to-earth" API might be
AbstractDictionary{I, T, IEltype <: IteratorEltype, TEltype <: IteratorEltype}.
Or maybe something like this?
abstract type AbstractIndexable{I,T} end
abstract type AbstractDictionary{I,T} <: AbstractIndexable{Some{I},Some{T}} end
IteratorEltype(::Type{AbstractIndexable{<:Any,Nothing}}) = EltypeUnknown()
IteratorEltype(::Type{AbstractIndexable{<:Any,<:Some}}) = HasEltype()
struct MappedIndexable{I} <: AbstractIndexable{Some{I},Nothing}
...
end
It's a bit unusual usage of Some{T}. But it might be better than putting the constant nothing in the type parameter?
It's a bit unusual usage of Some{T}. But it might be better than putting the constant nothing in the type parameter?
Yes, certainly seems plausible though. It's an extra layer of complication, but oh well.
I guess I kinda like traits better than type parameters? I think they make for better dispatch patterns.
Expanding on that - I'd mostly consider it an antipattern to dispatch on the element type of an abstract container.
Exceptions like calling BLAS for multiplying Matrix{Float64} would be better as _matmul(::LayoutTrait, ::Eltype, a::AbstractMatrix, b::AbstractMatrix). So eltype is just a trait. This avoids the StridedArray fiasco. Last time I did it editing the dispatch tables for matrix multiplication in LinearAlgebra was a huge minefield. And the result here is extensible (for new methods, or new matrix types, or even new layouts) with lower (IMO) risk of method ambiguity or user confusion and subtly helps enforce consistency accross the AbstractMatrix interface.
I guess I don't mind if it uses trait and has no type parameters. Type parameters make it easy to implement traits "for free" but I guess that's just an implementation detail that does not have to leak out.
I've also noticed this issue and, while it's true that fixing it for Iterators.map is a bit tricky (because of the type parameters of the lazy mapped dictionary), I was wondering: would it be easier to fix it for the regular Base.map in the meantime?
You could overload map_with_eltype from my package LazyMapWithElType. It's just like Iterators.map, except that it also accepts a return type parameter.
I don't think there can be a much better option until map and Iterators.map get fixed in Base.