Support for Ridge Regression
Allow to pass a semi-positive matrix such that the objective function is
β = inv(X.'W * X + Γ) * X.'W * y
Can you develop what concrete changes you suggest? Shouldn't/couldn't it be implemented in a separate package?
I think there already is a package with ridge regression, though I can't recall offhand which one it is
There is this implementation multivariatestatsjl, but is only for linear not GLM models. I haven't seen any for Ridge logistic regression for example.
I'm not sure you would want that particular formula for the GLM case. As the name implies, the Iteratively Reweighted Least Squares (IRLS) algorithm iterates on the W matrix. Would a fixed value of Γ make sense?
@dmbates I think this article might have the answer (if not the references or reaching out to the authors). Wikipedia also has the formula used for Ridge Poisson Regression.
This implementation has IRLS with Ridge for logistic.
Ideally, rather than implementing just Ridge, implementing Bayesian GLM would provide the additional features.
Would support for logistic regression with l1-regularization be part of this issue (special case of Bayesian GLM), or should it be a separate issue? (Here's an algorithm.)
Perhaps instead of going full-blown Bayesian, support for elastic net regularization would be great. The choices of features included with H2O.ai's GLM implementation seems to be practical.
Regularizations such as Ridge and LASSO are special cases of maximum a posterior (MAP) which under the special case of uninformative priors become maximum likelihood estimators (MLE). I do have plans to develop the MAP framework eventually as I believe most of the tools are available. However, it still requires a bit more work to generalize it (e.g., using MCMC). Ridge in particular is a strange mutant as it requires the standardization of the linear predictor assuming normality and a standard normal for all priors. Still it has the nice feature to be scale-less. LASSO can be used with specialized solvers, but the general cases do require some work (e.g., group LASSO and all its variants). I ain't support sold on elastic-net, but is doable with the MAP framework. MAP is still a quasi-Bayesian approach as it doesn't really estimate the whole distribution, but is computationally feasible and gets you the essential aspects of Bayesian inference. After getting basic support for most of regression analysis and whatnot, the other applications can be implemented as interest builds for each.