SparseArrays.jl sprand with function argument is confusing

sprand accepts a function argument rfn that is used to generate the non zero values of a sparse random matrix or vector. I find the calling convention with rfn to be inconsistent and confusing.

julia> sprand(0,0,1.0,rand)
0×0 SparseMatrixCSC{Float64,Int64} with 0 stored entries

julia> sprand(1,1,1.0,rand)
1×1 SparseMatrixCSC{Float64,Int64} with 1 stored entry:
  [1, 1]  =  0.686873

julia> sprand(1,1,1.0,i->fill(1.0,i))
1×1 SparseMatrixCSC{Float64,Int64} with 1 stored entry:
  [1, 1]  =  1.0

So far so good. What about generating Float32 values? The function accepts a type parameter

julia> sprand(0,0,1.0,rand,Float32)
0×0 SparseMatrixCSC{Float32,Int64} with 0 stored entries

julia> sprand(1,1,1.0,rand,Float32)
1×1 SparseMatrixCSC{Float64,Int64} with 1 stored entry:
  [1, 1]  =  0.411008

julia> sprand(1,1,1.0,i->fill(1.0,i),Float32)
1×1 SparseMatrixCSC{Float64,Int64} with 1 stored entry:
  [1, 1]  =  1.0

so the type parameter is ignored for non 0x0 matrices (rendering the method effectively type unstable by the way). This seems a bug, and is definitely ugly.

Moreover, sprand accepts also a RNG first argument:

using Random; r=Random.MersenneTwister(1);

julia> sprand(r,1,1,1.0,rand,Float32)
1×1 SparseMatrixCSC{Float64,Int64} with 1 stored entry:
  [1, 1]  =  0.644883

julia> sprand(r,1,1,1.0,i->fill(1.0,i),Float32)
ERROR: MethodError: no method matching (::getfield(Main, Symbol("##9#10")))(::MersenneTwister, ::Int64)
Closest candidates are:
  JuliaLang/julia#9(::Any) at REPL[36]:1
Stacktrace:
 [1] sprand(::MersenneTwister, ::Int64, ::Int64, ::Float64, ::getfield(Main, Symbol("##9#10")), ::Type{Float32}) at /home/ab/src/julia/build1.x/usr/share/julia/stdlib/v1.2/SparseArrays/src/sparsematrix.jl:1448
 [2] top-level scope at none:0

Here we learn that the signature for the passed method should change, now accepting r as well. I find this overly convoluted (I pass a function that gets passed an argument that I also pass: if I supply the function, I can pass it myself) and a bit inconsistent (the signature of the required rfn depends on other arguments).

Another inconsistency: without rfn, the type parameter can be also specified, but it goes as first argument instead of last.

Of course, most of these could be clarified in the documentation (now it is a bit terse):

  sprand([rng],[type],m,[n],p::AbstractFloat,[rfn])

  Create a random length m sparse vector or m by n sparse matrix, in which the probability of any element being nonzero is independently
  given by p (and hence the mean density of nonzeros is also exactly p). Nonzero values are sampled from the distribution specified by
  rfn and have the type type. The uniform distribution is used in case rfn is not specified. The optional rng argument specifies a
  random number generator, see Random Numbers.

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  julia> sprand(Bool, 2, 2, 0.5)
  2×2 SparseMatrixCSC{Bool,Int64} with 2 stored entries:
    [1, 1]  =  true
    [2, 1]  =  true
  
  julia> sprand(Float64, 3, 0.75)
  3-element SparseVector{Float64,Int64} with 1 stored entry:
    [3]  =  0.298614

But I really think that the calling convention could be improved very easily. I propose to change the behaviour in the following way: whenever a function is passed, this function accepts always only one argument (the number of values to generate) and no type can be specified. In this way, the caller can supply the function (using a random generator or not) he wants and generating the type of values he want (that will become the Tv type of the matrix). A patch for this is very simple, I can submit it if there is interest. I suppose it will have to wait until 2.x for merging as it will be breaking though.

Jan 07 '19 09:01 abraunst

I totally agree with your remarks; i had in mind to open an issue about at least documenting the requirements on the rfn parameter, which is too confusing as you demonstrated. My personal bias would lead me to simply remove this parameter, it's inconsistent with other rand functions, but I understand that some people may rely on that. My hope is that we find a design that allows uniform specifying of distributions across rand-related functions, and remove this specificity from sprand. Cf. JuliaLang/julia#24912 for a tentative in this direction.

Jan 07 '19 10:01 rfourquet

I totally agree with your remarks; i had in mind to open an issue about at least documenting the requirements on the rfn parameter, which is too confusing as you demonstrated. My personal bias would lead me to simply remove this parameter, it's inconsistent with other rand functions, but I understand that some people may rely on that. My hope is that we find a design that allows uniform specifying of distributions across rand-related functions, and remove this specificity from sprand. Cf. JuliaLang/julia#24912 for a tentative in this direction.

JuliaLang/julia#24912 seems nice, I will give it a try later. I don't dislike having the rfn parameter (in the current state of things), as it allows to completely separate the "structure" part from the "value" part (a feature which in a way is emphasized in JuliaLang/julia#24912 if I understand correctly).

Jan 07 '19 11:01 abraunst

SparseArrays.jl SparseArrays.jl copied to clipboard

sprand with function argument is confusing

SparseArrays.jl
SparseArrays.jl copied to clipboard