GLM.jl
GLM.jl copied to clipboard
Version 2.0 Breaking Changes
I've noticed that some packages rely on
TableRegressionModel
to support GLM: https://github.com/jmboehm/RegressionTables.jl/issues/128, https://github.com/yufongpeng/AnovaBase.jl/issues/52 and https://github.com/yufongpeng/AnovaGLM.jl/issues/6. Even if they adapt to support the new approach, we'd better bump version to 2.0 to avoid any breakage. That can also be the occasion to drop some long-deprecated API. We should check whether we would like to make any other breaking changes. (A few other packages useTableRegressionModel
for their own models, it would be good that they also stop using it but there's no hurry.)
Originally posted by @nalimilan in https://github.com/JuliaStats/GLM.jl/issues/339#issuecomment-1242947968
Here's a quick list of potentially issues that we might want to try to address as part of a push towards 2.0. Several are relatively straightforward, some could potentially be solved via more extensive documentation, and some will require Decisions to be made (e.g. all the stuff with weights).
- [x] #339
- [ ] remove deprecations (search the source for
deprecate
to catch "manual" deprecation warnings) - [x] all the issues related to handling of rank deficiency / multicollinearity (#449, #426, #413, #375, #280) because this involves potentially changing defaults
- [ ] potentially exposing a way for the user to choose between QR/Cholesky with the formula interface? Maybe even defaulting to the slightly slower but more stable QR method?
- [ ] #483
- [ ] #487
- [ ] #350
- [ ] #259
- [ ] #255
- [ ] #240
- [ ] drop support for Julia < 1.6 and strip out all the associated tests for output in those versions
- [ ] make internal fieldnames more transparent or at least add some comments to the struct definitions
There are several other issues I would like to see addressed sooner rather than later, but all are technically nonbreaking, at least under ColPrac guidelines (e.g., changes to the show methods, as raised in #461 and #469).
Right now, we are working on GLM with QR decomposition in two steps
- LM with QR
- GLM with QR and target is to complete by this calendar year.
Hope this will solve some issues related to the PosDefException
as mentioned above.
I would like to have Multiple dependent variables
, and Quasi Likelihood
in GLM 2.0
Nice to hear you're working on QR! I think we can wait until you finish that before tagging 2.0. OTC, multiple dependent variables and quasi-likelihood do not change current behavior so they can be added later (and we have to discuss whether they should live in this package or in a separate one).
I don't think we should do anything about https://github.com/JuliaStats/GLM.jl/issues/259. Anyway https://github.com/JuliaStats/GLM.jl/pull/487 will change nobs
to return an integer, as now the presence of weights is part of the type so there's no type instability. People can use size(modelmatrix(m), 1)
to find out the number of rows in the matrix if they need that information.
#483, https://github.com/JuliaStats/GLM.jl/issues/255 and https://github.com/JuliaStats/GLM.jl/issues/240 would be good to have, but not breaking AFAICT.
I hope 2.0 fixes https://github.com/JuliaStats/GLM.jl/issues/496 and throws an error on missing values to protect users from making analytical errors by accident. lm(...; skipmissing=true)
seems fine to me.