Seth Axen
Seth Axen
From #238: [This paper](https://dl.acm.org/doi/abs/10.1145/3382191) (there seems to be a free earlier version [here](https://notendur.hi.is/jonasson/greinar/blas-rmd.pdf)) has a number of reverse-mode rules, in particular for BLAS and LAPACK subroutines. Saw a bunch we...
It seems this particular issue is due to the derivative for `sqrt` assuming the input is nonzero. ```julia julia> ForwardDiff.gradient(x -> sqrt(sum(abs2, x)), [0,0]) # bad 2-element Vector{Float64}: NaN NaN...
FWIW, finite differences finds that the gradient of `norm` at a zero vector is zero: ```julia julia> using FiniteDifferences, LinearAlgebra julia> FiniteDifferences.grad(central_fdm(5, 1), norm, [0.0,0.0]) ([-3.8050255348364236e-17, -3.8050255348364236e-17],) ``` So I...
Seems to be `norm` would often be used in an optimization problem, where the optimum would be achieved when `norm(...) == 0`, so the `[0,0]` gradient makes sense to me....
And asserting the properties is better than just assuming everything is commutative, because at least an error is thrown instead of derivatives being silently wrong. > To elaborate: I'm not...
The paper linked in the OP gives the Frechet derivative as the same derivative you would get if you naively ran a forward mode AD on the same matrix exponential...
I would try using ForwardDiff with [ExponentialUtilities.exponential!](https://exponentialutilities.sciml.ai/dev/matrix_exponentials/#ExponentialUtilities.exponential!) with method ExpMethodHigham2005Base first. If that doesn't work, there are other options. ExponentialAction will be really slow for computing the full matrix exponential....
Coming back to this, here's an example with a benchmark: ```julia julia> using ExponentialUtilities, ExponentialAction, ForwardDiff, LinearAlgebra julia> myexp(A) = exponential!(copyto!(similar(A), A), ExpMethodGeneric()); julia> myexp2(A) = ExponentialAction.expv(1, A, I(size(A, 1)));...
> Does it make sense that it doesn't pull back the cotangent of the eigenvectors at all? Take a function that calls `eigen` to get the eigenvalues only and discards...
I wonder if the QR rule implied by Seeger et al in https://arxiv.org/pdf/1710.08717.pdf is more performant than the one in Walter and Lehmann? (they actually define an LQ rule, but...