Dictionaries.jl icon indicating copy to clipboard operation
Dictionaries.jl copied to clipboard

Typocalypse-free mapping

Open tkf opened this issue 5 years ago • 10 comments

Do you have a plan to remove all return_type from the API-visible part (including empty input)? Could you share your plan if you have one?

Also, I don't think it's possible to specify the eltype for Iterators.map(f, d::AbstractDictionary). I'd suggest using EltypeUnknown.

tkf avatar Jul 12 '20 21:07 tkf

Excellent question @tkf.

Yes, I have a plan. Since I have found good success with SizeUnknown for Iterators.filter, I'd now like use EltypeUnknown for Iterators.map, as you say. :)

This will mean removing the I and T type parameters from AbstractDictionary{I, T}. These will become proper traits - IteratorElType andeltype for values, and, I dunno, keytype and ... IteratorEltype(keys(dict)) for keys? The concrete types can of course have type parameters. It's a non-trivial but doable amount of work. (If it works well I'd go as far as suggesting removing T from AbstractArray{T} for Julia 2.0, as that pesky T causes difficulties with lazy adjoints and other things already).

Honestly, I'm not 100% sure about I yet, either. It's not necessary for map since that modifies the values not the indices. But there's probably no point keeping it around.

(including empty input)

As for the empty collections... I have some thoughts but I'd really love to hear any ideas?

andyferris avatar Jul 13 '20 00:07 andyferris

removing T from AbstractArray{T} for Julia 2.0

Oh my. I was thinking I was return_type-extremist :laughing:

This will mean removing the I and T type parameters from AbstractDictionary{I, T}.

I was actually not thinking to remove these type parameters (even eltype). I was thinking something like

IteratorEltype(::Type{<:AbstractDictionary{<:Any, T}}) where {T} = HasEltype()
IteratorEltype(::Type{<:AbstractDictionary}) = EltypeUnknown()
IteratorEltype(::Type{<:AbstractDictionary{<:Any, nothing}}) = EltypeUnknown()

so that you'd use MappedDictionary{I, ...} <: AbstractDictionary{I, nothing} or something like that.

As for the empty collections... I have some thoughts but I'd really love to hear any ideas?

If you want to keep HasEltype behavior, I'd use AbstractDictionary{I, Union{}}. That's something I've been using in Transducers.jl/BangBang.jl. I'm interested in your thought on this, too.

tkf avatar Jul 13 '20 01:07 tkf

Oh my. I was thinking I was return_type-extremist :laughing:

:laughing: Yeah it was partly a joke. Honestly I'm an "extremist" in that we should either support/encourage it or remove it - the current middle ground seems painful. I personally think encouraging it would be better but I'm too far removed from the type inference implementation to have a valid opinion.

I was thinking something like

Hmm... it's interesting but I'm worried putting nothing there could potentially make life harder than removing it entirely. Dispatch could be messy. It's not clear if that type position is meant to be a type or a constant. Etc.

Regarding Union{} for empty collections I think this works out really well for "functional" style code and we use it in StaticArrays. For variable length collections you end up with things like Union{Indices{Union{}}. Indices{T}} coming out of inference distinct, etc. When the eltype is Union{T, Missing} you get Union{Indices{Union{}}. Indices{T}, Indices{Missing},Indices{Union{Missing,T}}} which is widened by inference (to Indices{<:Any} I believe?). So too slow for common use cases :(

I sometimes feel the language would work well if struct and Array were covariant and inference could simply widen the above to Indices{Union{Missing,T}} instead.

(Furthermore, for non-empty collections there's an unfortunate overhead in Union{} being treated as a box rather than a singleton. As in Vector{Union{}}(undef, 1_000_000) allocates 8 MB instead of a few bytes. I think this can be fixed.)

andyferris avatar Jul 14 '20 01:07 andyferris

either support/encourage it or remove it - the current middle ground seems painful

Yeah, I agree it's painful ATM. But I think it is worth trying to make the current state as usable as possible. This would let us see the logical conclusion of the current API (which means to embrace mutate-or-widen all the way down, I guess). Maybe it'd prompt compiler guys to create a better return_type (e.g., rigorously define "inferable" subset of Julia and then provide tryinfer(f, types) :: Union{Some{<:Type}, Nothing}).

That said, I'm actually kind of OK with current API of return_type where you are only "allowed" to use for optimizations. Rather, "in return", I'd like to have more ways to force/control union split of the compiler (like tail-call elimination API).

but I'm worried putting nothing there could potentially make life harder than removing it entirely

Another more "down-to-earth" API might be AbstractDictionary{I, T, IEltype <: IteratorEltype, TEltype <: IteratorEltype}. But I just thought you might not want too many type parameters.

When the eltype is Union{T, Missing} you get Union{Indices{Union{}}. Indices{T}, Indices{Missing},Indices{Union{Missing,T}}} which is widened by inference (to Indices{<:Any} I believe?).

Isn't this solvable if you write "tail-call function-barrier" pattern?

ref Tail-call optimization and function-barrier -based accumulation in loops - Internals & Design - JuliaLang

So too slow for common use cases :(

If you have the typocalypse-free base implementation, it's always OK to use return_type-based optimized path, right? I guess the common cases are inferable ones?

if struct and Array were covariant and inference could simply widen the above to Indices{Union{Missing,T}} instead

Hm... interesting. What would be the difference to union splitting in this approach? In particular, even with covariant Indices, don't f(::Indices{Union{Missing,T}}) still need to run f(::Indices{T}) at machine code level? Maybe it's more like pushing the union splitting to the callee side or something?

there's an unfortunate overhead in Union{}

Yeah, it'd be nice to get rid of this. Meanwhile, I just created MicroCollections.UndefArray etc. as a workaround.

tkf avatar Jul 14 '20 03:07 tkf

but I'm worried putting nothing there could potentially make life harder than removing it entirely

Another more "down-to-earth" API might be AbstractDictionary{I, T, IEltype <: IteratorEltype, TEltype <: IteratorEltype}.

Or maybe something like this?

abstract type AbstractIndexable{I,T} end
abstract type AbstractDictionary{I,T} <: AbstractIndexable{Some{I},Some{T}} end

IteratorEltype(::Type{AbstractIndexable{<:Any,Nothing}}) = EltypeUnknown()
IteratorEltype(::Type{AbstractIndexable{<:Any,<:Some}}) = HasEltype()

struct MappedIndexable{I} <: AbstractIndexable{Some{I},Nothing}
    ...
end

It's a bit unusual usage of Some{T}. But it might be better than putting the constant nothing in the type parameter?

tkf avatar Jul 14 '20 03:07 tkf

It's a bit unusual usage of Some{T}. But it might be better than putting the constant nothing in the type parameter?

Yes, certainly seems plausible though. It's an extra layer of complication, but oh well.

I guess I kinda like traits better than type parameters? I think they make for better dispatch patterns.

andyferris avatar Jul 14 '20 04:07 andyferris

Expanding on that - I'd mostly consider it an antipattern to dispatch on the element type of an abstract container.

Exceptions like calling BLAS for multiplying Matrix{Float64} would be better as _matmul(::LayoutTrait, ::Eltype, a::AbstractMatrix, b::AbstractMatrix). So eltype is just a trait. This avoids the StridedArray fiasco. Last time I did it editing the dispatch tables for matrix multiplication in LinearAlgebra was a huge minefield. And the result here is extensible (for new methods, or new matrix types, or even new layouts) with lower (IMO) risk of method ambiguity or user confusion and subtly helps enforce consistency accross the AbstractMatrix interface.

andyferris avatar Jul 14 '20 04:07 andyferris

I guess I don't mind if it uses trait and has no type parameters. Type parameters make it easy to implement traits "for free" but I guess that's just an implementation detail that does not have to leak out.

tkf avatar Jul 14 '20 04:07 tkf

I've also noticed this issue and, while it's true that fixing it for Iterators.map is a bit tricky (because of the type parameters of the lazy mapped dictionary), I was wondering: would it be easier to fix it for the regular Base.map in the meantime?

piever avatar Jan 02 '22 20:01 piever

You could overload map_with_eltype from my package LazyMapWithElType. It's just like Iterators.map, except that it also accepts a return type parameter.

I don't think there can be a much better option until map and Iterators.map get fixed in Base.

nsajko avatar Dec 24 '23 03:12 nsajko