Ridge Regularization in linear solve: consistency

Open Algue-Rythme opened this issue 4 years ago • 0 comments

It will be easier to discuss this on github rather than internal Google doc.

The issue

Current state of API:

Function	Without ridge	With ridge r > 0	Remark
solve_cg	Ax=b	(A+rI)=b	well posed because A is PSD
solve_gmres	Ax=b	(A+rI)=b	ill-posed if A=-rI
solve_bicgstab	Ax=b	(A+rI)=b	ill-posed if A=-rI
solve_normal_cg	A^TAx=A^Tb	(A^TA+rI)x=A^Tb	well posed because A^TA is PSD

There are consistency issues here: with ridge regularization we expect (A^T+rI)(A+rI)x=(A^T+rI)b for solve_normal_cg. Consequently all of solve_cg, solve_gmres and solve_bicgstab are interchangeable when r > 0, but not with solve_nornal_cg. Worse: when r=0 they are all interchangeable with each other (at least for PD matrices).

Discussion:

Tikhonov regularization regularizes with A^TA+rI - just like solve_cg. This guarantees a well posed problem.

Other observation: most solvers of Sklearn for Ridge regression uses the A^TA+rI trick.
No one uses A+rI on a general matrix A: it only makes sense to do so on PSD matrix in general.

Solution

Two solutions:

Change (A^TA+rI)x=A^Tb into (A^T+rI)(A+rI)x=(A^T+rI)b, but in this case the problem is ill-posed for A=-rI.
Remove ridge regularization from gmres/bicgstab because currently the regularization may lead to matrix with worse condition number (unexpected behavior when we regularize a system). This happens in particular for NSD matrices.

I am in favor of the second option to remain consistent with literature; unless we can prove that the A+rI approach makes sense.

Dec 16 '21 15:12 Algue-Rythme