pyglmnet icon indicating copy to clipboard operation
pyglmnet copied to clipboard

tikhonov_from_prior treats incorrectly zero_indices

Open ivarzap opened this issue 5 years ago • 7 comments

https://github.com/glm-tools/pyglmnet/blob/b403bac72f5a227a9c34e38bfef1ff0c937b7720/pyglmnet/utils.py#L85

Should be: S_inv = 1. / S_inv

Otherwise, small singular values would be close to not being regularized

ivarzap avatar Jan 14 '20 17:01 ivarzap

this is standard way to invert a matrix, it's the moore-penrose pseudoinverse. Why do you see a problem with this?

jasmainak avatar Jan 17 '20 11:01 jasmainak

What is the reason to use Moore-Penrose pseudo-inverse to regularize?

In that case, say you have a singular value a bit over threshold (say 0.0002 with current defaults). Then, the projected space will regularize heavily this value (as expected). Other singular values that do not survive the cutoff (<=threshold) will not be regularized at all (except for a tiny quantity =threshold). I see that this could be an undesirable behavior for regularization for close to singular "priov_cov" matrices

ivarzap avatar Jan 17 '20 15:01 ivarzap

do you observe this problem in your data? what's the standard way to deal with this? If you just invert a 0 in a singular matrix, it will blow up.

jasmainak avatar Jan 17 '20 16:01 jasmainak

I am not using yet your code to regularize an elasticnet model, but I'm planning to implement it very soon.

My issue is of a theoretical character: I wonder what is the the case in which small singular value (SV) directions are close to not being regularized versus the larger SV directions.

In my current understanding, one introduces a ridge regularizer (Thikonov matrix proportional to identity) to shift the Moore-Penrose inverse of the X.T.dot(X) in order to being able to stabilize the regression by strongly suppressing small SV.

On the other hand, the computation proposed in tikhonov_from_prior inverts the sufficiently large singular values of prior_cov, which amounts to adding a bias against the small SV directions, but those that are really small are not regularized at all (those <= threshold). I would like to understand a basis for this use of the regularizers.

In my proposal, due to the small SV being shifted to threshold, the inversion of those indices too, would make the Tikhonov matrix very large (~1e4) in those directions and, to all purposes, they would disappear from the regression.

ivarzap avatar Jan 17 '20 17:01 ivarzap

@pavanramkumar do you have any comments here?

jasmainak avatar Jan 22 '20 17:01 jasmainak

@ivarzap thanks for your question.

  • What we are calling the Tikhonov matrix is the matrix square root of the inverse of the prior covariance matrix. In other words, the inner product of the Tikhonov matrix is the inverse of the covariance matrix. Notation from Wikipedia: https://en.wikipedia.org/wiki/Tikhonov_regularization

  • We are using SVD to compute the inverse of the covariance matrix. if the covariance matrix is not full rank, there are going to be some singular values very close to zero that will prevent the inversion of the diagonal matrix S by simply reciprocating all singular values. As @jasmainak said above, the standard practice to invert a diagonal matrix under these circumstances is to only reciprocate the singular values above a threshold. Here is a blog post that walks through the inversion step by step: https://www.johndcook.com/blog/2018/05/05/svd/

  • Approximating the inverse in this way is equivalent to approximating a low rank covariance matrix (and its inverse) by throwing away the left and right singular vectors for singular values below a threshold.

Hope this helps!

pavanramkumar avatar Feb 23 '20 03:02 pavanramkumar

@pavanramkumar I do think what we are doing is a bit non-standard. The threshold should probably be 0 by default? And in any case we should use scipy pinv instead of inventing our own.

jasmainak avatar Feb 26 '20 03:02 jasmainak