SparseArrays.jl
SparseArrays.jl copied to clipboard
Functionality issues with SparseMatrixCSC{T} when Missing<:T
I have a use case for sparse matrices where most values are 0 but some need to be indicated as missing. It seems like some very basic functionality for SparseMatrixCSC{T} is subtly broken/confusing when Missing <: T, and the semantics does not appear to correctly distinguish between unstored values being 0 or missing.
Examples on v1.3 of trying to create a 2x2 sparse matrix [missing 0; 0 0]:
Example 1: sparse(Vector, Vector, scalar)
I think this one is simply a missing method issue, assuming anyone would ever want to create a SparseMatrixCSC{Missing,Int64} (see Example 2).
julia> sparse([1], [1], 1, 2, 2) #create [1 0; 0 0]
2×2 SparseMatrixCSC{Int64,Int64} with 1 stored entry:
[1, 1] = 1
julia> sparse([1], [1], missing, 2, 2) #create [missing 0; 0 0]
ERROR: MethodError: no method matching sparse(::Array{Int64,1}, ::Array{Int64,1}, ::Missing, ::Int64, ::Int64)
...
Example 2: sparse(Vector, Vector, Vector)
Hits confusing corner case of literal list constructor and zero(::Missing), creating a result where zero cannot be distinguished from missing.
Ref: JuliaLang/julia#28854 JuliaLang/julia#31303
julia> A = sparse([1], [1], [missing], 2, 2) #resulting type treats unstored elements as missing, not zero; arguably confusing behavior due to zero(Missing) === missing
2×2 SparseMatrixCSC{Missing,Int64} with 1 stored entry:
[1, 1] = missing
julia> A[1,2] # extracts zero of Missing, which is missing
missing
julia> B = sparse([1], [1], Union{Missing,Int}[missing], 2, 2) #works "as expected", but user has to know to supply the type
2×2 SparseMatrixCSC{Union{Missing, Int64},Int64} with 1 stored entry:
[1, 1] = missing
julia> B[1,2]
0
Example 3: setting a previously unstored value to missing
julia> C = sparse([],[],Union{Missing,Int}[], 2, 2)
2×2 SparseMatrixCSC{Union{Missing, Int64},Int64} with 0 stored entries
julia> C[1,1] = missing
ERROR: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
[1] _setindex_scalar!(::SparseMatrixCSC{Union{Missing, Int64},Int64}, ::Missing, ::Int64, ::Int64) at /Users/sabae/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.3/SparseArrays/src/sparsematrix.jl:2461
[2] setindex!(::SparseMatrixCSC{Union{Missing, Int64},Int64}, ::Missing, ::Int64, ::Int64) at /Users/sabae/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.3/SparseArrays/src/sparsematrix.jl:2442
[3] top-level scope at REPL[95]:1
Example 3 seems really a bug to me. The problem is that iszero(missing) is missing by JuliaLang/julia#31303, but the SparseArrays code is not prepared with this ternary "logic", i.e. it has to decide if missing is zero or not to decide if to store it or not...
I guess instead of just if iszero(x) one could do if iszero(x) == true but it seems really annoying to have to do that everywhere to support a type with as weird semantics as missing.
The twisted thing is that missing == true is missing so this doesn't help...
EDIT: if iszero(x) === true would work, but I agree it would be (even more) ugly.
EDIT2: Going back to the general issue, these kind of problems are not exclusive to sparse arrays of course and will creep everywhere, see e.g. something like findfirst(x->x==3,[1,2,missing,3]). The thing is that == is not really even a mathematical relation when used with missings...
Hi I tried to recreate this in Julia 1.6.2 today and the behaviour is not as described.
Is a particular package involved?
SparseArrays
Ah, in that case, year when I loaded the package I got the same behaviour, sadly it hasn't magically been fixed.