GLM.jl
GLM.jl copied to clipboard
Add support for rank deficient `GeneralizedLinearModel`
This pull request supersedes #314 because I don't have the write access to either REPOs.
In #314 an issue about slower converge has been discussed. This PR solves that issue and adds a test case about that issue.
cc: @Nosferican @nalimilan @jiahao @dmbates @andreasnoack @DilumAluthge
Codecov Report
Merging #340 (f69d819) into master (2692f3c) will decrease coverage by
10.17%
. The diff coverage is100.00%
.
:exclamation: Current head f69d819 differs from pull request most recent head 5c5f2ef. Consider uploading reports for the commit 5c5f2ef to get more accurate results
@@ Coverage Diff @@
## master #340 +/- ##
===========================================
- Coverage 81.08% 70.90% -10.18%
===========================================
Files 7 6 -1
Lines 703 543 -160
===========================================
- Hits 570 385 -185
- Misses 133 158 +25
Impacted Files | Coverage Δ | |
---|---|---|
src/glmfit.jl | 75.86% <100.00%> (-0.59%) |
:arrow_down: |
src/linpred.jl | 61.05% <100.00%> (-9.66%) |
:arrow_down: |
src/glmtools.jl | 47.61% <0.00%> (-32.89%) |
:arrow_down: |
src/lm.jl | 72.22% <0.00%> (-21.12%) |
:arrow_down: |
src/ftest.jl | 97.95% <0.00%> (-0.51%) |
:arrow_down: |
src/GLM.jl | ||
src/negbinfit.jl | 93.75% <0.00%> (+12.05%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update b39253f...5c5f2ef. Read the comment docs.
As discussed in issue #273, the reason why we don't pivot by default is that it would produce inaccurate results by setting argument check=false
.
But now what I use here is the piv
vector which tells us linearly independent columns. So the result may no longer suffer from that.
Maybe we could check the rank when constructing DensePredChol to decide whether to compute CholeskyPivoted or not, or just compute the CholeskyPivoted by default, because if the matrix is full rank, then we would have piv=1:rank
. Then, we can leave a cleaner interface to users.
Well... It's hard to modify GLM without affecting LM, maybe we should keep the interface consistent, and merge it first?
I would love to see this one merged for two downstream packages. thanks!
Since the Project.toml is reverted, the Travis CI may fail. It doesn't mean that the code has a bug.
Since we'll squash and merge anyway you can either rebase on master, or merge master into this branch and fix the conflict that way.
Since this adds a feature (support for rank-deficient GLM), we should bump the minor version right?
Is there anything else holding up merging this? @andreasnoack can you take a quick look?
Bump @andreasnoack
Since this adds a feature (support for rank-deficient GLM), we should bump the minor version right?
Correct.
(Note: this rule only applies for packages that are >= version 1.0. Since GLM is at version >= 1.0, you are correct, we bump the minor version for new features).
@jason-xuan Would you have the time to add the comment @andreasnoack requested? I can't do that myself, yet it would be too bad that this PR would be blocked just because of this detail. Thanks!
@jason-xuan Would you have the time to add the comment @andreasnoack requested? I can't do that myself, yet it would be too bad that this PR would be blocked just because of this detail. Thanks!
No problem, I'll try to finish this next week
Codecov Report
Base: 84.12% // Head: 84.43% // Increases project coverage by +0.30%
:tada:
Coverage data is based on head (
a1bec45
) compared to base (fa74d14
). Patch coverage: 96.00% of modified lines in pull request are covered.
:exclamation: Current head a1bec45 differs from pull request most recent head 2eb96e5. Consider uploading reports for the commit 2eb96e5 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #340 +/- ##
==========================================
+ Coverage 84.12% 84.43% +0.30%
==========================================
Files 7 6 -1
Lines 819 790 -29
==========================================
- Hits 689 667 -22
+ Misses 130 123 -7
Impacted Files | Coverage Δ | |
---|---|---|
src/linpred.jl | 79.06% <95.83%> (+1.75%) |
:arrow_up: |
src/glmfit.jl | 80.14% <100.00%> (+1.54%) |
:arrow_up: |
src/glmtools.jl | 81.14% <0.00%> (-1.56%) |
:arrow_down: |
src/ftest.jl | 98.52% <0.00%> (-1.48%) |
:arrow_down: |
src/GLM.jl | ||
src/lm.jl | 96.69% <0.00%> (+0.63%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Which part here fixed the slow convergence?
@jason-xuan Do you plan to finish this? It would be too bad to let this be forgotten!
Sure! I'll see if I could figure out how to get a test case from R next week because I'm new to it.
If you prefer you can give me a Julia example and I'll run it in R for you. Though the syntax is quite close so it shouldn't be too hard.
If you prefer you can give me a Julia example and I'll run it in R for you. Though the syntax is quite close so it shouldn't be too hard.
[test/runtests.jl](https://github.com/jason-xuan/GLM.jl/blob/a1bec45accb4a0a94dd2b891e2e45e844d379396/test/runtests.jl#L174) The result of the uncomment check, I tried but I can't reproduce the result from R myself.
@nalimilan I calculated the deviance with RCall and got the following results:
julia> @rput dfrm;
R> fit <- glm(y~1+x1+x2+x3,data=dfrm,family=binomial())
R> deviance(fit)
[1] 138626.5
Compared with deviance(m1.model)=138626.46758072695
, it seems to be a rounded version of my result. Could it be considered OK? Is there a way to get a more accurate result?
R is just omitting the remaining digits. You can do print(deviance(fit), digits=10)
to see more of them. But probably the result is the same (for practical purposes).