pyglmnet tikhonov_from_prior treats incorrectly zero

https://github.com/glm-tools/pyglmnet/blob/b403bac72f5a227a9c34e38bfef1ff0c937b7720/pyglmnet/utils.py#L85

Should be: S_inv = 1. / S_inv

Otherwise, small singular values would be close to not being regularized

Jan 14 '20 17:01 ivarzap

this is standard way to invert a matrix, it's the moore-penrose pseudoinverse. Why do you see a problem with this?

Jan 17 '20 11:01 jasmainak

What is the reason to use Moore-Penrose pseudo-inverse to regularize?

In that case, say you have a singular value a bit over threshold (say 0.0002 with current defaults). Then, the projected space will regularize heavily this value (as expected). Other singular values that do not survive the cutoff (<=threshold) will not be regularized at all (except for a tiny quantity =threshold). I see that this could be an undesirable behavior for regularization for close to singular "priov_cov" matrices

Jan 17 '20 15:01 ivarzap

do you observe this problem in your data? what's the standard way to deal with this? If you just invert a 0 in a singular matrix, it will blow up.

Jan 17 '20 16:01 jasmainak

I am not using yet your code to regularize an elasticnet model, but I'm planning to implement it very soon.

My issue is of a theoretical character: I wonder what is the the case in which small singular value (SV) directions are close to not being regularized versus the larger SV directions.

In my current understanding, one introduces a ridge regularizer (Thikonov matrix proportional to identity) to shift the Moore-Penrose inverse of the X.T.dot(X) in order to being able to stabilize the regression by strongly suppressing small SV.

On the other hand, the computation proposed in tikhonov_from_prior inverts the sufficiently large singular values of prior_cov, which amounts to adding a bias against the small SV directions, but those that are really small are not regularized at all (those <= threshold). I would like to understand a basis for this use of the regularizers.

In my proposal, due to the small SV being shifted to threshold, the inversion of those indices too, would make the Tikhonov matrix very large (~1e4) in those directions and, to all purposes, they would disappear from the regression.

Jan 17 '20 17:01 ivarzap

@pavanramkumar do you have any comments here?

Jan 22 '20 17:01 jasmainak

@ivarzap thanks for your question.

What we are calling the Tikhonov matrix is the matrix square root of the inverse of the prior covariance matrix. In other words, the inner product of the Tikhonov matrix is the inverse of the covariance matrix. Notation from Wikipedia: https://en.wikipedia.org/wiki/Tikhonov_regularization
We are using SVD to compute the inverse of the covariance matrix. if the covariance matrix is not full rank, there are going to be some singular values very close to zero that will prevent the inversion of the diagonal matrix S by simply reciprocating all singular values. As @jasmainak said above, the standard practice to invert a diagonal matrix under these circumstances is to only reciprocate the singular values above a threshold. Here is a blog post that walks through the inversion step by step: https://www.johndcook.com/blog/2018/05/05/svd/
Approximating the inverse in this way is equivalent to approximating a low rank covariance matrix (and its inverse) by throwing away the left and right singular vectors for singular values below a threshold.

Hope this helps!

Feb 23 '20 03:02 pavanramkumar

@pavanramkumar I do think what we are doing is a bit non-standard. The threshold should probably be 0 by default? And in any case we should use scipy pinv instead of inventing our own.

Feb 26 '20 03:02 jasmainak

pyglmnet
pyglmnet copied to clipboard

tikhonov_from_prior treats incorrectly zero_indices

pyglmnet pyglmnet copied to clipboard

tikhonov_from_prior treats incorrectly zero_indices

pyglmnet
pyglmnet copied to clipboard