NamedArrays.jl
NamedArrays.jl copied to clipboard
Type stability of NamedArrays
When working on https://github.com/nalimilan/FreqTables.jl/pull/19 we hit a problem of type inference of NamedArrays
. Would it be possible to fix it?
Inference fails even the simplest constructor:
julia> @code_warntype NamedArray([1,2,3])
Variables:
#self#::Type{NamedArrays.NamedArray}
array::Array{Int64,1}
Body:
begin
SSAValue(0) = (Core.tuple)((Base.arraysize)(array::Array{Int64,1}, 1)::Int64)::Tuple{Int64}
return $(Expr(:invoke, MethodInstance for NamedArrays.NamedArray(::Array{Int64,1}, ::Array{Array{String,1},1}, ::Array{Symbol,1}), :(#self#), :(array), :($(Expr(:invoke, MethodInstance for collect(::Base.Generator{Tuple{Int64},NamedArrays.#defaultnames}), :(Base.collect), :($(Expr(:new, Base.Generator{Tuple{Int64},NamedArrays.#defaultnames}, :($(QuoteNode(NamedArrays.defaultnames))), SSAValue(0))))))), :($(Expr(:invoke, MethodInstance for collect(::Base.Generator{UnitRange{Int64},NamedArrays.#defaultdimname}), :(Base.collect), :($(Expr(:new, Base.Generator{UnitRange{Int64},NamedArrays.#defaultdimname}, :($(QuoteNode(NamedArrays.defaultdimname))), :($(Expr(:new, UnitRange{Int64}, 1, :((Base.select_value)((Base.sle_int)(1, 1)::Bool, 1, (Base.sub_int)(1, 1)::Int64)::Int64))))))))))))
end::Any
similar problems propagate to other operations (broadcasting, summing over margins, ...).
It would, but you have to give a little more context and pointers, because I have no idea what you mean, and the error message is sort-of difficult to parse for me.
That's not an error message. The problem is that the compiler is not able to infer the return type of the NamedArray
at all (visible via the ::Any
at the end). I suspect this might be fixable by using ntuple
instead of map
and array comprehensions, and/or by adding type assertions like tuple(dimnames...)::NTuple{N}
, since the compiler probably isn't smart enough to realize that the tuple of dimension names is necessarily of length N
due to the arguments checks at the top of the function.
Here you have more detailed examples:
julia> using NamedArrays, BenchmarkTools, Base.Test
julia> x = [1 2; 3 4]
2×2 Array{Int64,2}:
1 2
3 4
julia> @inferred NamedArray(x)
ERROR: return type NamedArrays.NamedArray{Int64,2,Array{Int64,2},Tuple{DataStructures.OrderedDict{String,Int64},DataStructures.OrderedDict{String,Int64}}} does not match inferred return type Any
Stacktrace:
[1] error(::String) at .\error.jl:21
julia> y = NamedArray(x)
2×2 Named Array{Int64,2}
A ╲ B │ 1 2
──────┼─────
1 │ 1 2
2 │ 3 4
julia> @inferred y ./ [1,2]
ERROR: return type NamedArrays.NamedArray{Float64,2,Array{Float64,2},Tuple{DataStructures.OrderedDict{String,Int64},DataStructures.OrderedDict{String,Int64}}} does not match inferred return type Any
Stacktrace:
[1] error(::String) at .\error.jl:21
julia> @inferred sum(y, 1)
ERROR: return type NamedArrays.NamedArray{Int64,2,Array{Int64,2},Tuple{DataStructures.OrderedDict{String,Int64},DataStructures.OrderedDict{String,Int64}}} does not match inferred return type Any
Stacktrace:
[1] error(::String) at .\error.jl:21
and here is a consequence of this:
julia> function f1()
x = Vector{Int}(10^6)
y = NamedArray(x)
for i in 1:10^6, j in 1:100
x[i] = i
end
end
f1 (generic function with 1 method)
julia> function f2()
x = Vector{Int}(10^6)
y = NamedArray(x)
for i in 1:10^6, j in 1:100
y[i] = i
end
end
f2 (generic function with 1 method)
julia> @benchmark f1()
BenchmarkTools.Trial:
memory estimate: 170.67 MiB
allocs estimate: 3000071
--------------
minimum time: 507.500 ms (0.00% GC)
median time: 648.597 ms (20.11% GC)
mean time: 641.219 ms (21.47% GC)
maximum time: 740.850 ms (29.49% GC)
--------------
samples: 8
evals/sample: 1
julia> @benchmark f2()
BenchmarkTools.Trial:
memory estimate: 3.15 GiB
allocs estimate: 202897871
--------------
minimum time: 4.661 s (9.19% GC)
median time: 4.676 s (9.46% GC)
mean time: 4.676 s (9.46% GC)
maximum time: 4.690 s (9.73% GC)
--------------
samples: 2
evals/sample: 1
I can see there is a problem, now. So the constructor needs to be changed, then? I don't really understand why the compiler can't figure out the return type of the constructor (my guess is that it would be the type of the constructor, what other type would it be constructing? But apparently things are not that simple).
The problem is the presence of type parameters. The compiler should be able to find out that the result is a NamedArray
, but parameters are harder to infer, so it gives up and returns Any
(which for performance is equivalent to NamedArray
without knowing the parameters). It would help if you could pass explicitly these parameters when constructing the array. Else, the exact types of the arguments used to construct the array need to be inferable by the compiler, in particular the size of tuples. The difficulty comes from the fact that the length of a tuple is a type parameter, but the length of an array is not; and constructors transform arrays into tuples. So you need to help the compiler understand that this length is always equal to N
.
Ah thanks, so I should end a constructor with a call to a parent constructor with explicit type parameters, then? I'll try that.
(I must say I am still not over the syntax change to the where T where AT
style, largely because that verbose style doesn't mean anything to me, I have no intuition what kind of information the where T
gives me/the compiler---it appears pretty informationless.)
If I run
f(n)::NTuple{n,Int} = tuple(fill(1, n)...)
@code_warntype f(1)
Variables:
#self# <optimized out>
n::Int64
Body:
begin
return (Core.typeassert)((Base.convert)((Core.apply_type)(Main.NTuple, n::Int64, Main.Int)::Type{Tuple{Vararg{Int64,_}}} where _, (Core._apply)(Main.tuple, $(Expr(:invoke, MethodInstance for fill!(::Array{Int64,1}, ::Int64), :(Base.fill!), :($(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Int64,1}, svec(Any, Int64), Array{Int64,1}, 0, :(n), 0))), 1)))::Tuple{Vararg{Int64,N} where N})::Tuple{Vararg{Int64,N}} where N, (Core.apply_type)(Main.NTuple, n::Int64, Main.Int)::Type{Tuple{Vararg{Int64,_}}} where _)::Tuple{Vararg{Int64,N}} where N
end::Tuple{Vararg{Int64,N}} where N
the Tuple{Vararg{Int64,N}} where N
shows up red in my REPL, suggesting it is not a good thing, but I don't see how this could be more specific. Maybe Tuple{Vararg{Int64,1}}
? But the 1
is only know at runtime.
In that example, the code cannot be type-stable since as you note n
is only known at runtime. But that's not the case for NamedArray
AFAICT, since N
depends on the dimension of the input array (or on the number of dimensions passed via arguments), which is a type parameter of the input argument.
I've tried to stabilize the type in the constructor, see https://github.com/davidavdav/NamedArrays.jl/commit/cb253ad48bd26800864a088a63cabfe739b6afbb. The culprit seems to be dicts
in function NamedArray{T,N}(array::AbstractArray{T,N}, names::NTuple{N,Vector}, dimnames::NTuple{N, Any}=defaultdimnames(array))
. I still get red type warnings NamedArrays.NamedArray{Int64,1,Array{Int64,1},_} where _
, suggesting that N
isn't known.
But I have two places ensuring N
is known, both in a dicts
type assertment and in the explicit type parameter given in the returned constructor.
Calling NamedArrays.NamedArray([1], [OrderedDict("a" => 1)])
is now resolved, but somehow the default for the 2nd argument [defaultnames(d) for d in size(array)]
screws up things.
I am kind-of debugged-out right now. I might consider to move the defaults for the 1-arg call of NamedArray
to move to the first constructor, that uses a tuple as 2nd argument. That is probably better anyway, since it saves a lot of array-tuple conversions.
I've tried to stabilize the type in the constructor, see cb253ad. The culprit seems to be dicts in function NamedArray{T,N}(array::AbstractArray{T,N}, names::NTuple{N,Vector}, dimnames::NTuple{N, Any}=defaultdimnames(array)). I still get red type warnings NamedArrays.NamedArray{Int64,1,Array{Int64,1},_} where _, suggesting that N isn't known.
The problem isn't N
, it's that OrderedDict
isn't a concrete type, so for the compiler it's not better than Any
. Replacing it with OrderedDict{VT, Int}
fixes the problem here.
Calling NamedArrays.NamedArray([1], [OrderedDict("a" => 1)]) is now resolved, but somehow the default for the 2nd argument [defaultnames(d) for d in size(array)] screws up things.
I don't think default keyword arguments can affect inferred: they're just like arguments passed manually. So the problem is always inside the function.
Are there any other issues? FWIW, I think some of the changes you have made are not really required. In particular, specifying the type parameters isn't useful when they are simply computed by calling typeof
on the arguments: the compiler does that automatically. It also looks like ::NTuple{length(dims), OrderedDict}
isn't needed for defaultnamesdict
. In general, the approach I use is that once everything is type-stable, I remove hacks and assertion as much as possible until inference stops working.
Thanks again,
I ended up in the black hole of trying to figure out what constructor is called, and I am reached a dead end (is that possible in a black hole? I suppose not. There is only the way forward). I commented-out all final constructor calls, and println'ed the entry point of every constructor, and I get an error that a non-existent constructor is called with an amount of arguments for which there is no code. And no result of printlns.
The type assertions are attempts to convince the compiler of the type, but apparently still are incomplete. I'll see if your suge=gestions work, now.
Tried to follow your suggestions in https://github.com/davidavdav/NamedArrays.jl/commit/a9bd5fff908073f154431e17c7e6de6d6ac9275f, but I still get type problem NamedArrays.NamedArray{Int64,1,Array{Int64,1},_} where _
. The call really is with 4th type argument NTuple{N, OrderedDict{VT,Int}}
. I don't see why that is not seen by the compiler.
Hmm. It appears the type assertion that commit removes is actually needed, and must be more precise: with ::NTuple{N, OrderedDict{VT,Int}}
, it works. Actually, you don't need to specify the type parameters if you do that. It might be possible to reorganize defaultnamesdict
to avoid the need for a type assertion, but I guess that's good enough as long as it works.
Note that the best way to ensure this does not regress is to add @inferred
everywhere possible in the tests.
Allright, it was slightly more difficult than that, but in the end it turned out to be eltype(VT)
that was needed in the type assertion.
If this now works for you guys, I can merge in master and eventually update the version in METADATA.
Great! Maybe @bkamins can confirm, but yes, a new release would certainly improve performance for all users, and notably FreqTables.
Great work! Thank you. I have checked the fix-58 branch. For tagging of the next release it would be perfect if the following two cases were fixed (the root cause is probably that broadcasting has some type inference problems):
julia> using Base.Test, NamedArrays
julia> y = @inferred NamedArray(rand(3,3)) # this is inferred correctly!
3×3 Named Array{Float64,2}
A ╲ B │ 1 2 3
──────┼────────────────────────────────
1 │ 0.864412 0.867367 0.399662
2 │ 0.885714 0.0843252 0.169732
3 │ 0.704418 0.441152 0.787319
julia> @inferred sum(y, 1)
ERROR: return type NamedArrays.NamedArray{Float64,2,Array{Float64,2},Tuple{DataStructures.OrderedDict{String,Int64},DataStructures.OrderedDict{String,Int64}}} does not match inferred return type Any
Stacktrace:
[1] error(::String) at .\error.jl:21
julia> @inferred y ./ [1,2,3]
ERROR: return type NamedArrays.NamedArray{Float64,2,Array{Float64,2},Tuple{DataStructures.OrderedDict{String,Int64},DataStructures.OrderedDict{String,Int64}}} does not match inferred return type Any
Stacktrace:
[1] error(::String) at .\error.jl:21
Those are probably what you need for freqtables. OK, I'll have a dig at them.
Thanks!
The sum
was easy, that was my own code, fixed in c19a721.
I have trouble finding out which code actually runs issuing a statement like y ./ [1, 2, 3]
. Actually, I always have trouble figuring out which lines of code run, I've never been able to find a way of working that out. I know there is a fantastic debugger, but I have never got that going to work, either.
It looks like in julia-v0.6
the handling of ./
has been largely overhauled, and that NamedArrays now relies mostly on code in Base. The only place I hook in with my code, now, I think, is in a call to Base.Broadcast.broadcast_c(f, t::Type{NamedArray}, As...)
. I see that in the code there (if that actually gets called, I have no idea, really) I try to construct the names, and if that fails, I return a normal Array
. That is probably a very type-instable thing to do.
Yes, that must be the function which is called by ./
. Indeed, returning an Array
in some cases is type-unstable unless that choice depends only on the types of the arguments.
Note that in 0.7 the broadcast
machinery has been changed again, to make it easier to implement (and it's now documented). So it could be worth fixing this only on 0.7, though some changes might be common to both the 0.6 and the 0.7 implementations.
I have checked it earlier (unfortunately as you say - this is a bit of digging) and in the end indeed broadcast_c
is called.
Probably it is best to follow @nalimilan recommendation in the implementation.
Hi,
OK, well, I'll have a quick dig at it then, but maybe concentrate on the type-stability of some other functions as well, and then release this for 0.6. Further fixes (like the ./
) can then be done with 0.7 support only.
I don't seem to be able to able to convince Julia what the resulting type of broadcast_c
is. I've spelled it all out in 41d7c45, but that is not enough. Problem is that the types of the names of a broadcast_c
NamedArray can depend on any of the arrays involved. It appears to me that this kind of type-stability programming kind-of defeats the purpose of Julia.
No, I think that's solvable, just like promote_type
can work with any number of arguments. The only requirement is that the return type must depend only on the types of inputs. To make the compiler able to use that information, I think there are two approaches:
- compute the type manually from only the input types, and pass that as a type parameter instead of
typeof(tdicts)
(whose type is currently not inferred) - simplify/reorganize the code to allow the compiler to do that automatically
I think for both cases, an approach which could help would use two steps:
- For each input array, compute a tuple of
OrderedDict
objects that would be used the result according to this input array, falling back onnothing
for dimensions which do not exist or do not have names (for non-NamedArray
inputs). This should be done via a helper function which would dispatch on the input type and number of dimensions (passed as aVal{N}
object to keep it in the type domain). - Pass all these tuples of
OrderedDict
to a function which would call itself recursively and return another tuple ofOrderedDict
, replacingnothing
entries with the correspondingOrderedDict
, if different fromnothing
. - Replace any remaining
nothing
entries by callingdefaultnamesdict
.
I admit that's a bit complex. The key here is to ensure that the helper function are themselves type-stable and that the compiler is able to guess their return types from their input types.
Another solution would be to use a generated function. It could work with an organization more similar to the current code, but generated functions are not completely trivial to write either.
(Truly, I didn't close this issue, Github did that for me.)
Thanks for the detailed proposal. I can see why the type-compiler (is this called "inference"?) has problems deducing the typeof(dicts)
parts, although I believe this addition did help in the case of the simpler case of the constructor (which surprised me a bit at the time).
Your proposal would take me couple of days to process, I suppose, but I'll have another dig at it.
Hello @nalimilan
I don't get anywhere with type programming, see c299c71 . Even the simplest functionality seems to require a typeof()
which appears to be a no-go for type inference, so I guess I am missing an essential thing.
Small progress in 891fa55, but can't get dictstyperecursive()
to be type stable.