jaxopt icon indicating copy to clipboard operation
jaxopt copied to clipboard

Ridge Regularization in linear solve: consistency

Open Algue-Rythme opened this issue 4 years ago • 0 comments

It will be easier to discuss this on github rather than internal Google doc.

The issue

Current state of API:

Function Without ridge With ridge r > 0 Remark
solve_cg Ax=b (A+rI)=b well posed because A is PSD
solve_gmres Ax=b (A+rI)=b ill-posed if A=-rI
solve_bicgstab Ax=b (A+rI)=b ill-posed if A=-rI
solve_normal_cg A^TAx=A^Tb (A^TA+rI)x=A^Tb well posed because A^TA is PSD

There are consistency issues here: with ridge regularization we expect (A^T+rI)(A+rI)x=(A^T+rI)b for solve_normal_cg. Consequently all of solve_cg, solve_gmres and solve_bicgstab are interchangeable when r > 0, but not with solve_nornal_cg. Worse: when r=0 they are all interchangeable with each other (at least for PD matrices).

Discussion:

Tikhonov regularization regularizes with A^TA+rI - just like solve_cg. This guarantees a well posed problem.

Other observation: most solvers of Sklearn for Ridge regression uses the A^TA+rI trick.
No one uses A+rI on a general matrix A: it only makes sense to do so on PSD matrix in general.

Solution

Two solutions:

  1. Change (A^TA+rI)x=A^Tb into (A^T+rI)(A+rI)x=(A^T+rI)b, but in this case the problem is ill-posed for A=-rI.
  2. Remove ridge regularization from gmres/bicgstab because currently the regularization may lead to matrix with worse condition number (unexpected behavior when we regularize a system). This happens in particular for NSD matrices.

I am in favor of the second option to remain consistent with literature; unless we can prove that the A+rI approach makes sense.

Algue-Rythme avatar Dec 16 '21 15:12 Algue-Rythme