SparseArrays.jl
SparseArrays.jl copied to clipboard
sprand with function argument is confusing
sprand accepts a function argument rfn that is used to generate the non zero values of a sparse random matrix or vector. I find the calling convention with rfn to be inconsistent and confusing.
julia> sprand(0,0,1.0,rand)
0×0 SparseMatrixCSC{Float64,Int64} with 0 stored entries
julia> sprand(1,1,1.0,rand)
1×1 SparseMatrixCSC{Float64,Int64} with 1 stored entry:
[1, 1] = 0.686873
julia> sprand(1,1,1.0,i->fill(1.0,i))
1×1 SparseMatrixCSC{Float64,Int64} with 1 stored entry:
[1, 1] = 1.0
So far so good. What about generating Float32 values? The function accepts a type parameter
julia> sprand(0,0,1.0,rand,Float32)
0×0 SparseMatrixCSC{Float32,Int64} with 0 stored entries
julia> sprand(1,1,1.0,rand,Float32)
1×1 SparseMatrixCSC{Float64,Int64} with 1 stored entry:
[1, 1] = 0.411008
julia> sprand(1,1,1.0,i->fill(1.0,i),Float32)
1×1 SparseMatrixCSC{Float64,Int64} with 1 stored entry:
[1, 1] = 1.0
so the type parameter is ignored for non 0x0 matrices (rendering the method effectively type unstable by the way). This seems a bug, and is definitely ugly.
Moreover, sprand accepts also a RNG first argument:
using Random; r=Random.MersenneTwister(1);
julia> sprand(r,1,1,1.0,rand,Float32)
1×1 SparseMatrixCSC{Float64,Int64} with 1 stored entry:
[1, 1] = 0.644883
julia> sprand(r,1,1,1.0,i->fill(1.0,i),Float32)
ERROR: MethodError: no method matching (::getfield(Main, Symbol("##9#10")))(::MersenneTwister, ::Int64)
Closest candidates are:
JuliaLang/julia#9(::Any) at REPL[36]:1
Stacktrace:
[1] sprand(::MersenneTwister, ::Int64, ::Int64, ::Float64, ::getfield(Main, Symbol("##9#10")), ::Type{Float32}) at /home/ab/src/julia/build1.x/usr/share/julia/stdlib/v1.2/SparseArrays/src/sparsematrix.jl:1448
[2] top-level scope at none:0
Here we learn that the signature for the passed method should change, now accepting r as well. I find this overly convoluted (I pass a function that gets passed an argument that I also pass: if I supply the function, I can pass it myself) and a bit inconsistent (the signature of the required rfn depends on other arguments).
Another inconsistency: without rfn, the type parameter can be also specified, but it goes as first argument instead of last.
Of course, most of these could be clarified in the documentation (now it is a bit terse):
sprand([rng],[type],m,[n],p::AbstractFloat,[rfn])
Create a random length m sparse vector or m by n sparse matrix, in which the probability of any element being nonzero is independently
given by p (and hence the mean density of nonzeros is also exactly p). Nonzero values are sampled from the distribution specified by
rfn and have the type type. The uniform distribution is used in case rfn is not specified. The optional rng argument specifies a
random number generator, see Random Numbers.
Examples
≡≡≡≡≡≡≡≡≡≡
julia> sprand(Bool, 2, 2, 0.5)
2×2 SparseMatrixCSC{Bool,Int64} with 2 stored entries:
[1, 1] = true
[2, 1] = true
julia> sprand(Float64, 3, 0.75)
3-element SparseVector{Float64,Int64} with 1 stored entry:
[3] = 0.298614
But I really think that the calling convention could be improved very easily. I propose to change the behaviour in the following way: whenever a function is passed, this function accepts always only one argument (the number of values to generate) and no type can be specified. In this way, the caller can supply the function (using a random generator or not) he wants and generating the type of values he want (that will become the Tv type of the matrix). A patch for this is very simple, I can submit it if there is interest. I suppose it will have to wait until 2.x for merging as it will be breaking though.
I totally agree with your remarks; i had in mind to open an issue about at least documenting the requirements on the rfn parameter, which is too confusing as you demonstrated.
My personal bias would lead me to simply remove this parameter, it's inconsistent with other rand functions, but I understand that some people may rely on that. My hope is that we find a design that allows uniform specifying of distributions across rand-related functions, and remove this specificity from sprand. Cf. JuliaLang/julia#24912 for a tentative in this direction.
I totally agree with your remarks; i had in mind to open an issue about at least documenting the requirements on the
rfnparameter, which is too confusing as you demonstrated. My personal bias would lead me to simply remove this parameter, it's inconsistent with otherrandfunctions, but I understand that some people may rely on that. My hope is that we find a design that allows uniform specifying of distributions acrossrand-related functions, and remove this specificity fromsprand. Cf. JuliaLang/julia#24912 for a tentative in this direction.
JuliaLang/julia#24912 seems nice, I will give it a try later. I don't dislike having the rfn parameter (in the current state of things), as it allows to completely separate the "structure" part from the "value" part (a feature which in a way is emphasized in JuliaLang/julia#24912 if I understand correctly).
See also #309