SparseDiffTools.jl icon indicating copy to clipboard operation
SparseDiffTools.jl copied to clipboard

Hessian decompression implementation

Open ElOceanografo opened this issue 3 years ago • 3 comments

Continuing on from this Discourse discussion, I've implemented a method for decompressing a Hessian matrix from its coloring:

using SparsityDetection
using SparseDiffTools
using ForwardDiff
using LinearAlgebra
using SparseArrays

function sparsehess(f, g, hs, col, x)
    D = hcat([float.(i .== col) for i in 1:maximum(col)]...)
    δ = 1e-8
    ii, jj, vv = findnz(triu(hs))
    Hc = hcat([(g(x + δ * D[:, j]) - g(x)) / δ for j in 1:maximum(col)]...)
    H = sparse(ii, jj, zeros(length(vv)))
    for (i, j) in zip(ii, jj)
        H[i, j] = Hc[i, col[j]]
    end
    return Symmetric(H)
end

This can be polished, but I've got a couple of questions before working it up as a PR:

  • Where/how should this go in the repo? Is there analogous Jacobian code I should model it off of?
  • What methods for calculating the gradients should there be?
  • On discourse @ChrisRackauckas suggested using star-2 coloring, but in some testing I just did it seems like the default GreedyD1Color is usually faster and more compact on Hessians with random sparsity patterns. Thoughts?
ntest = 500
nnzs = zeros(ntest)
coloring_timings = zeros(ntest, 2)
decompression_timings = zeros(ntest, 2)
ncolors = zeros(ntest, 2)
max_errors = zeros(ntest, 2)

colormethods = [SparseDiffTools.GreedyD1Color(), SparseDiffTools.GreedyStar2Color()]
for i in 1:ntest, 
    nx = 100
    xis = rand(1:nx, nx)
    xjs = rand(1:nx, nx)
    f(x) =  dot(x, x) + dot(x[xis], x[xjs])
    g(x) = ForwardDiff.gradient(f, x)
    x = randn(nx)
    hs = hessian_sparsity(f, x)
    nnzs[i] = nnz(hs)
    Hfd = ForwardDiff.hessian(f, x)
    for j in 1:2
        coloring_timings[i, j] = @elapsed col = matrix_colors(hs, colormethods[j])
        ncolors[i, j] = maximum(col)
        sparsehess(f, g, hs, col, x)
        t1 = time_ns()
        H = sparsehess(f, g, hs, col, x)
        t2 = time_ns()
        decompression_timings[i, j] = t2 - t1
        max_errors[i, j] = maximum(Matrix(H) .- Hfd)
    end
end

mnc = mean(ncolors, dims=1); mnc[2] / mnc[1]  # 2.3
mct = mean(coloring_timings, dims=1); mct[2] / mct[1] # 4.78
mdt = mean(decompression_timings, dims=1); mdt[2] / mdt[1] # 2.25

using StatsPlots
density(log10.(coloring_timings), label=["D1" "Star-2"],
    xlabel="log10(Coloring time)")
density(ncolors, label=["D1" "Star-2"],
    xlabel="Number of colors")
density(log10.(decompression_timings), label=["D1" "Star-2"],
    xlabel="log10(Decompression time)")
density(max_errors, label=["D1" "Star-2"],
    xlabel="Max. error re. ForwardDiff.hessian")

ElOceanografo avatar May 29 '21 22:05 ElOceanografo

On discourse @ChrisRackauckas suggested using star-2 coloring, but in some testing I just did it seems like the default GreedyD1Color is usually faster and more compact on Hessians with random sparsity patterns. Thoughts?

Alright, then let's go with that.

What methods for calculating the gradients should there be?

FiniteDiff, ForwardDiff, and Zygote, for now. AbstractDifferenatation.jl should come in a year to greatly simplify that.

Where/how should this go in the repo? Is there analogous Jacobian code I should model it off of?

https://github.com/JuliaDiff/SparseDiffTools.jl/blob/master/src/differentiation/compute_jacobian_ad.jl

It would be good to make an allocating and a non-allocating (pre-cached) form (noting though that reverse-mode AD will allocate).

ChrisRackauckas avatar May 29 '21 22:05 ChrisRackauckas

But don't worry about the generality of the code. PR a first version, let someone else follow up with improvements, etc. Open source is an iterative process, and it can be worrisome to think about non-allocating GPU-accelerated etc etc, but take it one step at a time.

ChrisRackauckas avatar May 29 '21 22:05 ChrisRackauckas

Ok, will do. I actually already made a pre-cached version in MarginalLogDensities, so I'll see if I can adapt that.

ElOceanografo avatar May 29 '21 22:05 ElOceanografo