GLM.jl icon indicating copy to clipboard operation
GLM.jl copied to clipboard

Add support for rank deficient `GeneralizedLinearModel`

Open jason-xuan opened this issue 5 years ago • 17 comments

This pull request supersedes #314 because I don't have the write access to either REPOs.

In #314 an issue about slower converge has been discussed. This PR solves that issue and adds a test case about that issue.

cc: @Nosferican @nalimilan @jiahao @dmbates @andreasnoack @DilumAluthge

jason-xuan avatar Oct 28 '19 20:10 jason-xuan

Codecov Report

Merging #340 (f69d819) into master (2692f3c) will decrease coverage by 10.17%. The diff coverage is 100.00%.

:exclamation: Current head f69d819 differs from pull request most recent head 5c5f2ef. Consider uploading reports for the commit 5c5f2ef to get more accurate results Impacted file tree graph

@@             Coverage Diff             @@
##           master     #340       +/-   ##
===========================================
- Coverage   81.08%   70.90%   -10.18%     
===========================================
  Files           7        6        -1     
  Lines         703      543      -160     
===========================================
- Hits          570      385      -185     
- Misses        133      158       +25     
Impacted Files Coverage Δ
src/glmfit.jl 75.86% <100.00%> (-0.59%) :arrow_down:
src/linpred.jl 61.05% <100.00%> (-9.66%) :arrow_down:
src/glmtools.jl 47.61% <0.00%> (-32.89%) :arrow_down:
src/lm.jl 72.22% <0.00%> (-21.12%) :arrow_down:
src/ftest.jl 97.95% <0.00%> (-0.51%) :arrow_down:
src/GLM.jl
src/negbinfit.jl 93.75% <0.00%> (+12.05%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b39253f...5c5f2ef. Read the comment docs.

codecov-io avatar Oct 31 '19 01:10 codecov-io

As discussed in issue #273, the reason why we don't pivot by default is that it would produce inaccurate results by setting argument check=false.

But now what I use here is the piv vector which tells us linearly independent columns. So the result may no longer suffer from that.

Maybe we could check the rank when constructing DensePredChol to decide whether to compute CholeskyPivoted or not, or just compute the CholeskyPivoted by default, because if the matrix is full rank, then we would have piv=1:rank. Then, we can leave a cleaner interface to users.

Well... It's hard to modify GLM without affecting LM, maybe we should keep the interface consistent, and merge it first?

jason-xuan avatar Nov 03 '19 02:11 jason-xuan

I would love to see this one merged for two downstream packages. thanks!

AsafManela avatar Jan 03 '20 15:01 AsafManela

Since the Project.toml is reverted, the Travis CI may fail. It doesn't mean that the code has a bug.

jason-xuan avatar Jan 12 '20 23:01 jason-xuan

Since we'll squash and merge anyway you can either rebase on master, or merge master into this branch and fix the conflict that way.

Since this adds a feature (support for rank-deficient GLM), we should bump the minor version right?

Is there anything else holding up merging this? @andreasnoack can you take a quick look?

kleinschmidt avatar Mar 05 '20 14:03 kleinschmidt

Bump @andreasnoack

DilumAluthge avatar Mar 13 '20 13:03 DilumAluthge

Since this adds a feature (support for rank-deficient GLM), we should bump the minor version right?

Correct.

(Note: this rule only applies for packages that are >= version 1.0. Since GLM is at version >= 1.0, you are correct, we bump the minor version for new features).

DilumAluthge avatar Mar 13 '20 13:03 DilumAluthge

@jason-xuan Would you have the time to add the comment @andreasnoack requested? I can't do that myself, yet it would be too bad that this PR would be blocked just because of this detail. Thanks!

nalimilan avatar Mar 09 '21 11:03 nalimilan

@jason-xuan Would you have the time to add the comment @andreasnoack requested? I can't do that myself, yet it would be too bad that this PR would be blocked just because of this detail. Thanks!

No problem, I'll try to finish this next week

jason-xuan avatar Mar 14 '21 07:03 jason-xuan

Codecov Report

Base: 84.12% // Head: 84.43% // Increases project coverage by +0.30% :tada:

Coverage data is based on head (a1bec45) compared to base (fa74d14). Patch coverage: 96.00% of modified lines in pull request are covered.

:exclamation: Current head a1bec45 differs from pull request most recent head 2eb96e5. Consider uploading reports for the commit 2eb96e5 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #340      +/-   ##
==========================================
+ Coverage   84.12%   84.43%   +0.30%     
==========================================
  Files           7        6       -1     
  Lines         819      790      -29     
==========================================
- Hits          689      667      -22     
+ Misses        130      123       -7     
Impacted Files Coverage Δ
src/linpred.jl 79.06% <95.83%> (+1.75%) :arrow_up:
src/glmfit.jl 80.14% <100.00%> (+1.54%) :arrow_up:
src/glmtools.jl 81.14% <0.00%> (-1.56%) :arrow_down:
src/ftest.jl 98.52% <0.00%> (-1.48%) :arrow_down:
src/GLM.jl
src/lm.jl 96.69% <0.00%> (+0.63%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov-commenter avatar Sep 14 '21 03:09 codecov-commenter

Which part here fixed the slow convergence?

andreasnoack avatar Sep 28 '21 06:09 andreasnoack

@jason-xuan Do you plan to finish this? It would be too bad to let this be forgotten!

nalimilan avatar Feb 06 '22 14:02 nalimilan

Sure! I'll see if I could figure out how to get a test case from R next week because I'm new to it.

jason-xuan avatar Feb 06 '22 14:02 jason-xuan

If you prefer you can give me a Julia example and I'll run it in R for you. Though the syntax is quite close so it shouldn't be too hard.

nalimilan avatar Feb 06 '22 21:02 nalimilan

If you prefer you can give me a Julia example and I'll run it in R for you. Though the syntax is quite close so it shouldn't be too hard.

[test/runtests.jl](https://github.com/jason-xuan/GLM.jl/blob/a1bec45accb4a0a94dd2b891e2e45e844d379396/test/runtests.jl#L174) The result of the uncomment check, I tried but I can't reproduce the result from R myself.

jason-xuan avatar Feb 08 '22 13:02 jason-xuan

@nalimilan I calculated the deviance with RCall and got the following results:

julia> @rput dfrm;

R> fit <- glm(y~1+x1+x2+x3,data=dfrm,family=binomial())

R> deviance(fit)
[1] 138626.5

Compared with deviance(m1.model)=138626.46758072695, it seems to be a rounded version of my result. Could it be considered OK? Is there a way to get a more accurate result?

jason-xuan avatar Mar 25 '22 16:03 jason-xuan

R is just omitting the remaining digits. You can do print(deviance(fit), digits=10) to see more of them. But probably the result is the same (for practical purposes).

nalimilan avatar Mar 25 '22 17:03 nalimilan