GLM.jl Support for Ridge Regression

Allow to pass a semi-positive matrix such that the objective function is

β = inv(X.'W * X + Γ) * X.'W * y

Dec 14 '17 02:12 Nosferican

Can you develop what concrete changes you suggest? Shouldn't/couldn't it be implemented in a separate package?

Dec 14 '17 15:12 nalimilan

I think there already is a package with ridge regression, though I can't recall offhand which one it is

Dec 14 '17 19:12 ararslan

There is this implementation multivariatestatsjl, but is only for linear not GLM models. I haven't seen any for Ridge logistic regression for example.

Dec 14 '17 19:12 Nosferican

I'm not sure you would want that particular formula for the GLM case. As the name implies, the Iteratively Reweighted Least Squares (IRLS) algorithm iterates on the W matrix. Would a fixed value of Γ make sense?

Dec 14 '17 21:12 dmbates

@dmbates I think this article might have the answer (if not the references or reaching out to the authors). Wikipedia also has the formula used for Ridge Poisson Regression.

This implementation has IRLS with Ridge for logistic.

Dec 16 '17 05:12 Nosferican

Ideally, rather than implementing just Ridge, implementing Bayesian GLM would provide the additional features.

Dec 23 '17 19:12 Nosferican

Would support for logistic regression with l1-regularization be part of this issue (special case of Bayesian GLM), or should it be a separate issue? (Here's an algorithm.)

Apr 26 '18 22:04 hung-q-ngo

Perhaps instead of going full-blown Bayesian, support for elastic net regularization would be great. The choices of features included with H2O.ai's GLM implementation seems to be practical.

May 15 '18 05:05 hung-q-ngo

Regularizations such as Ridge and LASSO are special cases of maximum a posterior (MAP) which under the special case of uninformative priors become maximum likelihood estimators (MLE). I do have plans to develop the MAP framework eventually as I believe most of the tools are available. However, it still requires a bit more work to generalize it (e.g., using MCMC). Ridge in particular is a strange mutant as it requires the standardization of the linear predictor assuming normality and a standard normal for all priors. Still it has the nice feature to be scale-less. LASSO can be used with specialized solvers, but the general cases do require some work (e.g., group LASSO and all its variants). I ain't support sold on elastic-net, but is doable with the MAP framework. MAP is still a quasi-Bayesian approach as it doesn't really estimate the whole distribution, but is computationally feasible and gets you the essential aspects of Bayesian inference. After getting basic support for most of regression analysis and whatnot, the other applications can be implemented as interest builds for each.

May 15 '18 05:05 Nosferican