SparseArrays.jl icon indicating copy to clipboard operation
SparseArrays.jl copied to clipboard

Faster collect

Open KnutAM opened this issue 2 years ago • 1 comments

For vector sv=sprand(n,0.2):

Vector: n = 16
master:   119.365 ns (1 allocation: 192 bytes)
pr:       45.455 ns (1 allocation: 192 bytes)
Vector: n = 65536
master:   2.682 ms (2 allocations: 512.05 KiB)
pr:       37.200 μs (2 allocations: 512.05 KiB)
Benchmark code with more cases
julia> using SparseArrays, BenchmarkTools
julia> function collect_pr(x::Union{AbstractSparseVector,AbstractSparseMatrix})
           if Base.has_offset_axes(x)
               return Base._collect_indices(axes(x), x)
           else
               return Array(x)
           end
       end;
julia> for n in [16^i for i in 1:4]
        sv = sprand(n, 0.2)
        println("Vector: n = $n")
        print("master: "); @btime collect($sv)
        print("pr:     "); @btime collect_pr($sv)
        if n <= 16^3
         println("Matrix: n = $n")
         sm = sprand(n, n, 0.2)
         print("master: "); @btime collect($sm)
         print("pr:     "); @btime collect_pr($sm)
        end
       end
Vector: n = 16
master:   119.365 ns (1 allocation: 192 bytes)
pr:       45.455 ns (1 allocation: 192 bytes)
Matrix: n = 16
master:   660.606 ns (1 allocation: 2.12 KiB)
pr:       660.355 ns (1 allocation: 2.12 KiB)
Vector: n = 256
master:   2.456 μs (1 allocation: 2.12 KiB)
pr:       365.502 ns (1 allocation: 2.12 KiB)
Matrix: n = 256
master:   120.700 μs (2 allocations: 512.05 KiB)
pr:       121.700 μs (2 allocations: 512.05 KiB)
Vector: n = 4096
master:   108.600 μs (2 allocations: 32.05 KiB)
pr:       2.556 μs (2 allocations: 32.05 KiB)
Matrix: n = 4096
master:   64.762 ms (2 allocations: 128.00 MiB)
pr:       63.805 ms (2 allocations: 128.00 MiB)
Vector: n = 65536
master:   2.682 ms (2 allocations: 512.05 KiB)
pr:       37.200 μs (2 allocations: 512.05 KiB)

Not sure how to avoid using Base._collect_indices without just copying that code. Perhaps the best option is implementing this faster collect only for SparseVector and SparseMatrixCSC (if below is included) to alleviate the need for that branch.

For the matrix, there is no change as the constructor method for Matrix(::AbstractSparseMatrix) doesn't seem optimized like the vector one... I could get 1.5-2.5 times speedup with the following implementation inspired by the Vector constructor. Would that be of interest to include in the pr?

function fMatrix(S::SparseMatrixCSC{Tv}) where Tv
    Base.has_offset_axes(S) && return Matrix(S) # Only needed if typeof(S) is generalized
    n, m = size(S)
    M = zeros(Tv, n, m);
    isempty(M) && return M
    colptr = SparseArrays.getcolptr(S)
    rowval = SparseArrays.getrowval(S)
    nzval = SparseArrays.getnzval(S)
    for col in 1:length(colptr)-1
        for i in colptr[col]:(colptr[col+1]-1)
            row = rowval[i]
            val = nzval[i]
            M[row,col] = val
        end
    end
    return M
end

KnutAM avatar Dec 05 '22 18:12 KnutAM

Codecov Report

Merging #312 (0652c41) into main (57cbb74) will decrease coverage by 0.01%. The diff coverage is 75.00%.

@@            Coverage Diff             @@
##             main     #312      +/-   ##
==========================================
- Coverage   93.62%   93.61%   -0.02%     
==========================================
  Files          12       12              
  Lines        7390     7394       +4     
==========================================
+ Hits         6919     6922       +3     
- Misses        471      472       +1     
Impacted Files Coverage Δ
src/sparsevector.jl 94.95% <75.00%> (-0.07%) :arrow_down:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Dec 05 '22 18:12 codecov-commenter

@KnutAM Can you make a PR for the Matrix(SparseMatrixCSC) case?

ViralBShah avatar Jan 17 '23 10:01 ViralBShah