aslam_optimizer
aslam_optimizer copied to clipboard
M-Estimator weighting is sub-optimal.
We should use this method (from the ceres mailing list).
https://groups.google.com/forum/#!msg/ceres-solver/CbpQ32dyvvk/arKGcbjOmqwJ
Gim,
Here is a not entirely rigorous derivation, but it should get the point across.
Let us start with the objective function
f = 1/2 rho(z'z)
Differentiating it once and twice we get the gradient and hessian as
g = rho'J'z H = 2rho'' J'zz'J + rho' JJ' + O(second order)
dropping second order terms we get
g = rho' J' z H = J'(rho' + 2 rho'' zz')J
Now our aim is to come up with a new expression for J s.t. it is as close to H as possible. For reasonable functions rho, it will be the case that rho' is positive and rho'' is negative.
Now, let us assume that we are scaling J by a matrix A, so compare the expression for the "scaled" Gauss-Newton hessian
J'A'AJ
with
J'(rho' + 2 rho'' zz')J x
Now, assuming that the matrix sandwiched between J'J is not indefinite we can equat
rho' + 2 rho'' zz' = A'A
if we divide everything by rho' (and assuming its positivity) things become a little clearer
I + 2 rho''/rho' zz' = 1/sqrt(rho') A' 1/sqrt(rho') A
since for most reasonable values of robust loss functions, rho'' is negative, we are subtracting a rank one matrix from identity. We already know the direction along which the subtraction is taking place it is z, so we can ask
if A = I - alpha hat{z} hat{z}'
for some alpha, what would its value be? Here hat{z} is the unit vector along z.
A'A = I - 2 alpha hat{z} hat{z}' + alpha^2 hat{z} hat{z}' = I + (alpha^2 - 2 alpha) hat{z}hat{z}'
comparing coefficients
alpha^2 - 2alpha = 2 rho''/rho . |z|^2
which is the required equation. And yes this performs much better than just re-weighting by a scalar.
Sameer