SparseArrays.jl Operations on SparseMatrices depend heavily on inference

In JuliaLang/julia#19518, broadcast for sparse matrices underwent a major revision, overall a move in the right direction, IMHO. But it also resulted in a regression (in my eyes) if type inference fails for the operation/element type combination. This issue is meant to continue the discussion started in https://github.com/JuliaLang/julia/pull/19518#issuecomment-266693283.

There are three aspects here that make the problem more acute than for ordinary Arrays.

For sparse matrices, even basic operations like + and - just invoke broadcast internally. OTOH, for Arrays, these do some special tricks for the result type, which helps for the empty case. However, getting the type right for the empty case may not be important enough to warrant such special-casing the be replicated.
However, broadcast for sparse matrices relies on inference to determine the output element type not only in the empty as in all-dimensions-zero case, but always ~~also if no entries need to be stored (everything zero), making this case much more likely to occur~~.
Once inference fails for an all-zero (or truely empty) case, the element type becomes Any, which precludes all further operations as zero(::Any) is undefined.

E.g.:

julia> x = spzeros(Real, 2, 2)
2×2 sparse matrix with 0 Real nonzero entries

julia> x + x
2×2 sparse matrix with 0 Any nonzero entries

julia> x + x + x
ERROR: MethodError: no method matching zero(::Type{Any})

Edit: Similarly also

julia> speye(Real, 3) * 1 * 1
ERROR: MethodError: no method matching zero(::Type{Any})

The immediate fix I have in mind (but I find the sparse broadcast machinery too intimidating on first glance to try to implement properly) is the following: If the result of broadcast(op, a, b) would have element type Any (i.e. inference failed) and nzval is empty, use typeof(op(zero(eltype(a)), zero(eltype(b)))) as element type instead. (Likewise for one- and more-argument cases, of course.) For the example above, this would give an element type of Int which would be much more useful than Any.

Dec 14 '16 09:12 martinholters

Ah, no, point 2 does not hold, things are even worse:

julia> x = sparse(eye(Real,3,3))
3×3 sparse matrix with 3 Real nonzero entries:
  [1, 1]  =  1
  [2, 2]  =  1
  [3, 3]  =  1

julia> x + x
ERROR: MethodError: no method matching zero(::Type{Any})

Edit: Oh, I see, that is in part fixed by JuliaLang/julia#19589.

But even on that branch:

julia> x + x
3×3 sparse matrix with 3 Any nonzero entries:
  [1, 1]  =  2
  [2, 2]  =  2
  [3, 3]  =  2

Element type is Any, while broadcast on Array gives the much more useful element type of Int:

julia> broadcast(+, eye(Real,3,3), eye(Real,3,3))
3×3 Array{Int64,2}:
 2  0  0
 0  2  0
 0  0  2

We really should get that for sparse matrices, too.

Dec 14 '16 10:12 martinholters

I have added

--- a/base/sparse/sparsematrix.jl
+++ b/base/sparse/sparsematrix.jl
@@ -1434,7 +1434,15 @@ function broadcast!{Tf,N}(f::Tf, C::SparseMatrixCSC, A::SparseMatrixCSC, Bs::Var
                         _broadcast_notzeropres!(f, fofzeros, C, A, Bs...)
 end
 function broadcast{Tf,N}(f::Tf, A::SparseMatrixCSC, Bs::Vararg{SparseMatrixCSC,N})
-    _aresameshape(A, Bs...) && return map(f, A, Bs...) # could avoid a second dims check in map
+    if _aresameshape(A, Bs...)
+        res = map(f, A, Bs...) # could avoid a second dims check in map
+        if !isleaftype(eltype(res))
+            t = mapreduce(typeof, typejoin, typeof(f(zero(eltype(A)), zero.(eltype.(Bs))...)), res.nzval)
+            return convert(SparseMatrixCSC{t,indtype(res)}, res)
+        else
+            return res
+        end
+    end
     fofzeros = f(_zeros_eltypes(A, Bs...)...)
     fpreszeros = !_broadcast_isnonzero(fofzeros)
     indextypeC = _promote_indtype(A, Bs...)

on top of JuliaLang/julia#19589 and that has fixed my most pressing troubles. I guess we would want this behavior

in all cases, not just when delegating to map
implemented in a more efficient manner, if possible. (Similar to how it is done for Array, perhaps?)

@Sacha0 do you agree and if so, could you tackle that?

Dec 14 '16 13:12 martinholters

What would happen if we made 0 (as in the int) the result of zero(::Any). This would not be type stable, but it only comes up in code that already isn't, so I don't think we loose anything there. It also has the advantage that it should lead to relatively sane behavior.

Dec 14 '16 14:12 oscardssmith

do you agree and if so, could you tackle that?

Before expressing an opinion I would like to think this issue through carefully (and possibly play with an implementation). Completing the overall functionality (extending the present work to sparse vectors, combinations of sparse vectors / matrices and scalars, et cetera) seems higher priority and likely to consume all time I have prior to feature freeze. Could we revisit this topic in a few weeks?

In the interim, if moving from promote_op to _promote_op(f, typestuple(As...)) in map/broadcast over one or two input sparse matrices is causing substantial breakage, as a stopgap we could return to calling promote_op in those cases.

Thoughts? Thanks and best!

Dec 14 '16 21:12 Sacha0

Completing the overall functionality (extending the present work to sparse vectors, combinations of sparse vectors / matrices and scalars, et cetera) seems higher priority and likely to consume all time I have prior to feature freeze. Could we revisit this topic in a few weeks?

Counterproposal: We try to fix any mistakes first before repeating them for the other cases.

Having thought about this some more, I am pretty sure that the sparse broadcasting code should never depend on type inference. For the Array code, this is used as a last resort (and maybe as a performance optimization?), namely when broadcasting over empty inputs where no actual output elements have been computed to use the type of. For sparse matrices, we compute f(zero(...), ...) anyway, so when we can use the type of that for the empty/no-stored-elements case, and the typejoin of all computed output values (including f(zero(...), ...)) otherwise, similar to how it is done for Arrays.

Dec 15 '16 07:12 martinholters

Marking as regression and adding milestone as this means arithmetic on sparse matrices becomes broken whenever inference hick-ups.

Dec 15 '16 12:12 martinholters

Thoughts:

The result eltype of sparse broadcast should match that of dense broadcast. Dense broadcast determines result eltype as follows: It retrieves the inferred eltype. If (1) the inferred eltype is concrete, it uses the inferred eltype. If (2) the result is empty, it uses the inferred eltype. If (3) the inferred eltype is abstract and the result is nonempty, it computes and uses the typejoin of the types of the result's values.

Note that a sparse vector/matrix resulting from broadcast that stores no entries is not generally empty: It contains some combination of zeros. In that situation sparse broadcast's return eltype should also match that of dense broadcast, using the inferred eltype if the inferred eltype is concrete or, if the inferred eltype is abstract and the result is nonempty, the typejoin of the types of the result's values. Similarly note that a truly empty sparse vector/matrix resulting from broadcast does not contain zeros of any kind, in which situation consistency with dense broadcast's result eltype (the inferred eltype) seems the most important concern.

I have a plan for realizing this behavior fully. At present sparse broadcast's behavior matches dense broadcast's behavior in cases (1) and (2) of the three above, covering most use. Though I hope to match case (3) in the near-term (next few weeks), prioritizing overall functionality (involving feature introduction) over matching case (3) seems prudent thirteen days out from feature freeze.

Marking as regression and adding milestone as this means arithmetic on sparse matrices becomes broken whenever inference hick-ups.

Kindly remember that as of a few weeks ago: generic sparse broadcast over a single matrix did not exist, only a few specializations for common operations (which, having no promotion mechanism, often got the result type incorrect even in simple cases e.g. JuliaLang/julia#18974); sparse broadcast over a pair of matrices existed, but was not generic (silently returning incorrect results as e.g. in JuliaLang/julia#19110), and its "promotion mechanism" was promote_type on the input eltypes (often incorrect, though 'accidentally' correct in some of the cases mentioned above); sparse broadcast did not exist for more than two input matrices.

With that in mind, please consider that characterizing generic sparse broadcast's present promotion behavior as a regression seems a bit of a stretch :wink:. In contrast, suboptimal, inconsistent, or incomplete all seem like reasonable characterizations.

As noted above, promotion cases (1) and (2) cover most use, and case (3) can be matched after feature freeze. On the other hand, extending generic sparse broadcast to sparse vectors and combinations involving scalars need be complete by feature freeze. As such, prioritizing the latter seems prudent at this time; please bear with me for the short while that requires.

Best!

Dec 19 '16 01:12 Sacha0

truly empty sparse vector/matrix resulting from broadcast does not contain zeros of any kind, in which situation consistency with dense broadcast's result eltype (the inferred eltype) seems the most important concern.

Well, using type inference in the empty case is used for the dense broadcast primarily due to the lack of alternatives. Whether that should be replicated for the sparse case might be open for debate, as here, using zero to obtain a pseudo-element might be ok, while it is definitely not ok for dense arrays. So I'm not sure this is the "most important concern".

With that in mind, please consider that characterizing generic sparse broadcast's present promotion behavior as a regression seems a bit of a stretch

Oh, your work on sparse broadcast is certainly a huge step forward, no doubt. I'm not characterizing it as a regression. I'm merely saying it lead to a (fairly specific) regression: Something that ought to work and worked before stopped working. Unfortunately, this regression completely broke ACME, which is why I'm personally very interested it seeing it fixed. I think I have found a workaround, though, so I'm a bit more relaxed now. :smile:

On a related note:

julia> a=Real[0];

julia> a+a
1-element Array{Real,1}:
 0

julia> broadcast(+, a, a)
1-element Array{Int64,1}:
 0

because + uses the (hacky?) promote_op while broadcast(+, ...) does not. Would you say we should make dense and sparse behave the same? If so, maybe it's a good opportunity to rethink the dense behavior?

Dec 19 '16 07:12 martinholters

I think I have found a workaround

Cheers, glad to hear :).

On a related note:

I agree wholeheartedly that the (Real[0] + Real[0])::Vector{Real} versus broadcast(+, Real[0], Real[0])::Vector{Int} discrepancy is disconcerting, particularly given that after JuliaLang/julia#17623 Real[0] + Real[0] and Real[0] .+ Real[0] will bear different eltypes.

With the disclaimer that I haven't fully thought this issue through yet, like @pabloferz I err on the side of Real[0] + Real[0] yielding a Vector{Int} (consistent with broadcast): Generally applying promote_op's present behavior seems suboptimal (e.g. having round.(Int, Real[0]) return Vector{Real} rather than Vector{Int}), and what a better container-abstract-eltype preservation rule would say is not clear to me. In contrast, broadcast's eltype algorithm is simple, clear, and mostly works well (notwithstanding the troublesome empty case). On the other hand, (Real[0] + Real[0])::Vector{Real} certainly has its appeal.

Thoughts? Thanks and best!

Dec 19 '16 20:12 Sacha0

To me, changing the behavior of + for dense matrices to be in line with broadcast(+, ...) (i.e. returning Vector{Int} for the example above) looks like the only option that has the potential of being consistent and maintainable. Maybe we should open a dedicated issue for further discussion?

Dec 20 '16 07:12 martinholters

Maybe we should open a dedicated issue for further discussion?

Sounds good! Best!

Dec 20 '16 18:12 Sacha0

Issue is at JuliaLang/julia#19669.

Dec 21 '16 08:12 martinholters

While it would be ideal not to be changing return types of these operations post-feature-freeze, if the current behavior is considered buggy / incomplete then this can hopefully be worked on after feature freeze but prior to / during RC's. Probably not absolutely release blocking?

Dec 22 '16 18:12 tkelman

Is this not handled by the new broadcast @mbauman ?

Jun 01 '18 00:06 ChrisRackauckas

No, unfortunately not. The default dense case is fixed, but the sparse infrastructure is still built around relying on inference. It's a pretty major refactor to fix it as I recall.

Jun 01 '18 04:06 mbauman

SparseArrays.jl SparseArrays.jl copied to clipboard

Operations on SparseMatrices depend heavily on inference

SparseArrays.jl
SparseArrays.jl copied to clipboard