NamedArrays.jl Type stability of NamedArrays

When working on https://github.com/nalimilan/FreqTables.jl/pull/19 we hit a problem of type inference of NamedArrays. Would it be possible to fix it?

Inference fails even the simplest constructor:

julia> @code_warntype NamedArray([1,2,3])
Variables:
  #self#::Type{NamedArrays.NamedArray}
  array::Array{Int64,1}

Body:
  begin
      SSAValue(0) = (Core.tuple)((Base.arraysize)(array::Array{Int64,1}, 1)::Int64)::Tuple{Int64}
      return $(Expr(:invoke, MethodInstance for NamedArrays.NamedArray(::Array{Int64,1}, ::Array{Array{String,1},1}, ::Array{Symbol,1}), :(#self#), :(array), :($(Expr(:invoke, MethodInstance for collect(::Base.Generator{Tuple{Int64},NamedArrays.#defaultnames}), :(Base.collect), :($(Expr(:new, Base.Generator{Tuple{Int64},NamedArrays.#defaultnames}, :($(QuoteNode(NamedArrays.defaultnames))), SSAValue(0))))))), :($(Expr(:invoke, MethodInstance for collect(::Base.Generator{UnitRange{Int64},NamedArrays.#defaultdimname}), :(Base.collect), :($(Expr(:new, Base.Generator{UnitRange{Int64},NamedArrays.#defaultdimname}, :($(QuoteNode(NamedArrays.defaultdimname))), :($(Expr(:new, UnitRange{Int64}, 1, :((Base.select_value)((Base.sle_int)(1, 1)::Bool, 1, (Base.sub_int)(1, 1)::Int64)::Int64))))))))))))
  end::Any

similar problems propagate to other operations (broadcasting, summing over margins, ...).

Dec 21 '17 22:12 bkamins

It would, but you have to give a little more context and pointers, because I have no idea what you mean, and the error message is sort-of difficult to parse for me.

Dec 22 '17 09:12 davidavdav

That's not an error message. The problem is that the compiler is not able to infer the return type of the NamedArray at all (visible via the ::Any at the end). I suspect this might be fixable by using ntuple instead of map and array comprehensions, and/or by adding type assertions like tuple(dimnames...)::NTuple{N}, since the compiler probably isn't smart enough to realize that the tuple of dimension names is necessarily of length N due to the arguments checks at the top of the function.

Dec 22 '17 09:12 nalimilan

Here you have more detailed examples:

julia> using NamedArrays, BenchmarkTools, Base.Test

julia> x = [1 2; 3 4]
2×2 Array{Int64,2}:
 1  2
 3  4

julia> @inferred NamedArray(x)
ERROR: return type NamedArrays.NamedArray{Int64,2,Array{Int64,2},Tuple{DataStructures.OrderedDict{String,Int64},DataStructures.OrderedDict{String,Int64}}} does not match inferred return type Any
Stacktrace:
 [1] error(::String) at .\error.jl:21

julia> y = NamedArray(x)
2×2 Named Array{Int64,2}
A ╲ B │ 1  2
──────┼─────
1     │ 1  2
2     │ 3  4

julia> @inferred y ./ [1,2]
ERROR: return type NamedArrays.NamedArray{Float64,2,Array{Float64,2},Tuple{DataStructures.OrderedDict{String,Int64},DataStructures.OrderedDict{String,Int64}}} does not match inferred return type Any
Stacktrace:
 [1] error(::String) at .\error.jl:21

julia> @inferred sum(y, 1)
ERROR: return type NamedArrays.NamedArray{Int64,2,Array{Int64,2},Tuple{DataStructures.OrderedDict{String,Int64},DataStructures.OrderedDict{String,Int64}}} does not match inferred return type Any
Stacktrace:
 [1] error(::String) at .\error.jl:21

and here is a consequence of this:

julia> function f1()
           x = Vector{Int}(10^6)
           y = NamedArray(x)
           for i in 1:10^6, j in 1:100
               x[i] = i
           end
       end
f1 (generic function with 1 method)

julia> function f2()
           x = Vector{Int}(10^6)
           y = NamedArray(x)
           for i in 1:10^6, j in 1:100
               y[i] = i
           end
       end
f2 (generic function with 1 method)

julia> @benchmark f1()
BenchmarkTools.Trial:
  memory estimate:  170.67 MiB
  allocs estimate:  3000071
  --------------
  minimum time:     507.500 ms (0.00% GC)
  median time:      648.597 ms (20.11% GC)
  mean time:        641.219 ms (21.47% GC)
  maximum time:     740.850 ms (29.49% GC)
  --------------
  samples:          8
  evals/sample:     1

julia> @benchmark f2()
BenchmarkTools.Trial:
  memory estimate:  3.15 GiB
  allocs estimate:  202897871
  --------------
  minimum time:     4.661 s (9.19% GC)
  median time:      4.676 s (9.46% GC)
  mean time:        4.676 s (9.46% GC)
  maximum time:     4.690 s (9.73% GC)
  --------------
  samples:          2
  evals/sample:     1

Dec 22 '17 09:12 bkamins

I can see there is a problem, now. So the constructor needs to be changed, then? I don't really understand why the compiler can't figure out the return type of the constructor (my guess is that it would be the type of the constructor, what other type would it be constructing? But apparently things are not that simple).

Dec 22 '17 10:12 davidavdav

The problem is the presence of type parameters. The compiler should be able to find out that the result is a NamedArray, but parameters are harder to infer, so it gives up and returns Any (which for performance is equivalent to NamedArray without knowing the parameters). It would help if you could pass explicitly these parameters when constructing the array. Else, the exact types of the arguments used to construct the array need to be inferable by the compiler, in particular the size of tuples. The difficulty comes from the fact that the length of a tuple is a type parameter, but the length of an array is not; and constructors transform arrays into tuples. So you need to help the compiler understand that this length is always equal to N.

Dec 22 '17 10:12 nalimilan

Ah thanks, so I should end a constructor with a call to a parent constructor with explicit type parameters, then? I'll try that.

(I must say I am still not over the syntax change to the where T where AT style, largely because that verbose style doesn't mean anything to me, I have no intuition what kind of information the where T gives me/the compiler---it appears pretty informationless.)

Dec 22 '17 10:12 davidavdav

If I run

f(n)::NTuple{n,Int} = tuple(fill(1, n)...)
 
@code_warntype f(1)

Variables:
  #self# <optimized out>
  n::Int64

Body:
  begin 
      return (Core.typeassert)((Base.convert)((Core.apply_type)(Main.NTuple, n::Int64, Main.Int)::Type{Tuple{Vararg{Int64,_}}} where _, (Core._apply)(Main.tuple, $(Expr(:invoke, MethodInstance for fill!(::Array{Int64,1}, ::Int64), :(Base.fill!), :($(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Int64,1}, svec(Any, Int64), Array{Int64,1}, 0, :(n), 0))), 1)))::Tuple{Vararg{Int64,N} where N})::Tuple{Vararg{Int64,N}} where N, (Core.apply_type)(Main.NTuple, n::Int64, Main.Int)::Type{Tuple{Vararg{Int64,_}}} where _)::Tuple{Vararg{Int64,N}} where N
  end::Tuple{Vararg{Int64,N}} where N

the Tuple{Vararg{Int64,N}} where N shows up red in my REPL, suggesting it is not a good thing, but I don't see how this could be more specific. Maybe Tuple{Vararg{Int64,1}}? But the 1 is only know at runtime.

Dec 22 '17 11:12 davidavdav

In that example, the code cannot be type-stable since as you note n is only known at runtime. But that's not the case for NamedArray AFAICT, since N depends on the dimension of the input array (or on the number of dimensions passed via arguments), which is a type parameter of the input argument.

Dec 22 '17 13:12 nalimilan

I've tried to stabilize the type in the constructor, see https://github.com/davidavdav/NamedArrays.jl/commit/cb253ad48bd26800864a088a63cabfe739b6afbb. The culprit seems to be dicts in function NamedArray{T,N}(array::AbstractArray{T,N}, names::NTuple{N,Vector}, dimnames::NTuple{N, Any}=defaultdimnames(array)). I still get red type warnings NamedArrays.NamedArray{Int64,1,Array{Int64,1},_} where _, suggesting that N isn't known.

But I have two places ensuring N is known, both in a dicts type assertment and in the explicit type parameter given in the returned constructor.

Calling NamedArrays.NamedArray([1], [OrderedDict("a" => 1)]) is now resolved, but somehow the default for the 2nd argument [defaultnames(d) for d in size(array)] screws up things.

I am kind-of debugged-out right now. I might consider to move the defaults for the 1-arg call of NamedArray to move to the first constructor, that uses a tuple as 2nd argument. That is probably better anyway, since it saves a lot of array-tuple conversions.

Dec 22 '17 14:12 davidavdav

I've tried to stabilize the type in the constructor, see cb253ad. The culprit seems to be dicts in function NamedArray{T,N}(array::AbstractArray{T,N}, names::NTuple{N,Vector}, dimnames::NTuple{N, Any}=defaultdimnames(array)). I still get red type warnings NamedArrays.NamedArray{Int64,1,Array{Int64,1},_} where _, suggesting that N isn't known.

The problem isn't N, it's that OrderedDict isn't a concrete type, so for the compiler it's not better than Any. Replacing it with OrderedDict{VT, Int} fixes the problem here.

Calling NamedArrays.NamedArray([1], [OrderedDict("a" => 1)]) is now resolved, but somehow the default for the 2nd argument [defaultnames(d) for d in size(array)] screws up things.

I don't think default keyword arguments can affect inferred: they're just like arguments passed manually. So the problem is always inside the function.

Are there any other issues? FWIW, I think some of the changes you have made are not really required. In particular, specifying the type parameters isn't useful when they are simply computed by calling typeof on the arguments: the compiler does that automatically. It also looks like ::NTuple{length(dims), OrderedDict} isn't needed for defaultnamesdict. In general, the approach I use is that once everything is type-stable, I remove hacks and assertion as much as possible until inference stops working.

Dec 22 '17 15:12 nalimilan

Thanks again,

I ended up in the black hole of trying to figure out what constructor is called, and I am reached a dead end (is that possible in a black hole? I suppose not. There is only the way forward). I commented-out all final constructor calls, and println'ed the entry point of every constructor, and I get an error that a non-existent constructor is called with an amount of arguments for which there is no code. And no result of printlns.

The type assertions are attempts to convince the compiler of the type, but apparently still are incomplete. I'll see if your suge=gestions work, now.

Dec 22 '17 15:12 davidavdav

Tried to follow your suggestions in https://github.com/davidavdav/NamedArrays.jl/commit/a9bd5fff908073f154431e17c7e6de6d6ac9275f, but I still get type problem NamedArrays.NamedArray{Int64,1,Array{Int64,1},_} where _. The call really is with 4th type argument NTuple{N, OrderedDict{VT,Int}}. I don't see why that is not seen by the compiler.

Dec 22 '17 15:12 davidavdav

Hmm. It appears the type assertion that commit removes is actually needed, and must be more precise: with ::NTuple{N, OrderedDict{VT,Int}}, it works. Actually, you don't need to specify the type parameters if you do that. It might be possible to reorganize defaultnamesdict to avoid the need for a type assertion, but I guess that's good enough as long as it works.

Note that the best way to ensure this does not regress is to add @inferred everywhere possible in the tests.

Dec 22 '17 15:12 nalimilan

Allright, it was slightly more difficult than that, but in the end it turned out to be eltype(VT) that was needed in the type assertion.

If this now works for you guys, I can merge in master and eventually update the version in METADATA.

Dec 22 '17 21:12 davidavdav

Great! Maybe @bkamins can confirm, but yes, a new release would certainly improve performance for all users, and notably FreqTables.

Dec 22 '17 22:12 nalimilan

Great work! Thank you. I have checked the fix-58 branch. For tagging of the next release it would be perfect if the following two cases were fixed (the root cause is probably that broadcasting has some type inference problems):

julia> using Base.Test, NamedArrays

julia> y = @inferred NamedArray(rand(3,3)) # this is inferred correctly!
3×3 Named Array{Float64,2}
A ╲ B │         1          2          3
──────┼────────────────────────────────
1     │  0.864412   0.867367   0.399662
2     │  0.885714  0.0843252   0.169732
3     │  0.704418   0.441152   0.787319

julia> @inferred sum(y, 1)
ERROR: return type NamedArrays.NamedArray{Float64,2,Array{Float64,2},Tuple{DataStructures.OrderedDict{String,Int64},DataStructures.OrderedDict{String,Int64}}} does not match inferred return type Any
Stacktrace:
 [1] error(::String) at .\error.jl:21

julia> @inferred y ./ [1,2,3]
ERROR: return type NamedArrays.NamedArray{Float64,2,Array{Float64,2},Tuple{DataStructures.OrderedDict{String,Int64},DataStructures.OrderedDict{String,Int64}}} does not match inferred return type Any
Stacktrace:
 [1] error(::String) at .\error.jl:21

Dec 23 '17 07:12 bkamins

Those are probably what you need for freqtables. OK, I'll have a dig at them.

Dec 23 '17 12:12 davidavdav

Thanks!

Dec 23 '17 18:12 bkamins

The sum was easy, that was my own code, fixed in c19a721.

I have trouble finding out which code actually runs issuing a statement like y ./ [1, 2, 3]. Actually, I always have trouble figuring out which lines of code run, I've never been able to find a way of working that out. I know there is a fantastic debugger, but I have never got that going to work, either.

It looks like in julia-v0.6 the handling of ./ has been largely overhauled, and that NamedArrays now relies mostly on code in Base. The only place I hook in with my code, now, I think, is in a call to Base.Broadcast.broadcast_c(f, t::Type{NamedArray}, As...). I see that in the code there (if that actually gets called, I have no idea, really) I try to construct the names, and if that fails, I return a normal Array. That is probably a very type-instable thing to do.

Dec 24 '17 13:12 davidavdav

Yes, that must be the function which is called by ./. Indeed, returning an Array in some cases is type-unstable unless that choice depends only on the types of the arguments.

Note that in 0.7 the broadcast machinery has been changed again, to make it easier to implement (and it's now documented). So it could be worth fixing this only on 0.7, though some changes might be common to both the 0.6 and the 0.7 implementations.

Dec 24 '17 14:12 nalimilan

I have checked it earlier (unfortunately as you say - this is a bit of digging) and in the end indeed broadcast_c is called.

Probably it is best to follow @nalimilan recommendation in the implementation.

Dec 24 '17 18:12 bkamins

Hi,

OK, well, I'll have a quick dig at it then, but maybe concentrate on the type-stability of some other functions as well, and then release this for 0.6. Further fixes (like the ./) can then be done with 0.7 support only.

Dec 24 '17 22:12 davidavdav

I don't seem to be able to able to convince Julia what the resulting type of broadcast_c is. I've spelled it all out in 41d7c45, but that is not enough. Problem is that the types of the names of a broadcast_c NamedArray can depend on any of the arrays involved. It appears to me that this kind of type-stability programming kind-of defeats the purpose of Julia.

Dec 26 '17 13:12 davidavdav

No, I think that's solvable, just like promote_type can work with any number of arguments. The only requirement is that the return type must depend only on the types of inputs. To make the compiler able to use that information, I think there are two approaches:

compute the type manually from only the input types, and pass that as a type parameter instead of typeof(tdicts) (whose type is currently not inferred)
simplify/reorganize the code to allow the compiler to do that automatically

I think for both cases, an approach which could help would use two steps:

For each input array, compute a tuple of OrderedDict objects that would be used the result according to this input array, falling back on nothing for dimensions which do not exist or do not have names (for non-NamedArray inputs). This should be done via a helper function which would dispatch on the input type and number of dimensions (passed as a Val{N} object to keep it in the type domain).
Pass all these tuples of OrderedDict to a function which would call itself recursively and return another tuple of OrderedDict, replacing nothing entries with the corresponding OrderedDict, if different from nothing.
Replace any remaining nothing entries by calling defaultnamesdict.

I admit that's a bit complex. The key here is to ensure that the helper function are themselves type-stable and that the compiler is able to guess their return types from their input types.

Another solution would be to use a generated function. It could work with an organization more similar to the current code, but generated functions are not completely trivial to write either.

Dec 26 '17 14:12 nalimilan

(Truly, I didn't close this issue, Github did that for me.)

Thanks for the detailed proposal. I can see why the type-compiler (is this called "inference"?) has problems deducing the typeof(dicts) parts, although I believe this addition did help in the case of the simpler case of the constructor (which surprised me a bit at the time).

Your proposal would take me couple of days to process, I suppose, but I'll have another dig at it.

Dec 27 '17 09:12 davidavdav

Hello @nalimilan

I don't get anywhere with type programming, see c299c71 . Even the simplest functionality seems to require a typeof() which appears to be a no-go for type inference, so I guess I am missing an essential thing.

Jan 01 '18 11:01 davidavdav

Small progress in 891fa55, but can't get dictstyperecursive() to be type stable.

Jan 01 '18 16:01 davidavdav

NamedArrays.jl NamedArrays.jl copied to clipboard

Type stability of NamedArrays

NamedArrays.jl
NamedArrays.jl copied to clipboard