GLM.jl
GLM.jl copied to clipboard
Implementation of `dropcollinear` feature in `GeneralizedLinearModel`
This pull request supersedes #340
We are looking for the dropcollinear
feature in GLM and found #340. The PR, which opened in Oct'2019, looks very close to complete.
Apart from the changes covered in PR 340, we have added
- A test case to compare LM with equivalent GLM
- Two test cases to match output with R
- Another implementation of
dof
forCholeskyPivoted
since thedof
will be calculated based on rank, instead of the number of coefficients. - update API documentation: Added
dropcollinear
in the list of keyword arguments.
We have performed logistic regression with 10 independent variables, 10,000 rows and dropcollinear
feature true
and false
; the dataset does not have a multicollinearity issue. The machine configuration is as follows: -
OS – Linux,
CPU - AMD Ryzen 5 3600 6-Core Processor
RAM – 64 GB
With dropcollinear
= true
: average time 4.381 ms
With dropcollinear
= false
: average time 4.396 ms
Codecov Report
Base: 87.08% // Head: 87.32% // Increases project coverage by +0.24%
:tada:
Coverage data is based on head (
07e44de
) compared to base (e2e9c85
). Patch coverage: 100.00% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## master #488 +/- ##
==========================================
+ Coverage 87.08% 87.32% +0.24%
==========================================
Files 7 7
Lines 929 947 +18
==========================================
+ Hits 809 827 +18
Misses 120 120
Impacted Files | Coverage Δ | |
---|---|---|
src/glmfit.jl | 81.01% <100.00%> (+0.10%) |
:arrow_up: |
src/linpred.jl | 85.61% <100.00%> (+2.00%) |
:arrow_up: |
src/lm.jl | 93.33% <100.00%> (-0.05%) |
:arrow_down: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
@mousum-github Can you fix the failing doctests?
Hi @nalimilan,
Covered most of your suggestions. But there is a conflict in glmfit.jl
file and unable check whether all checks are passed or not.
Sorry for the delay in response.
cc: @ViralBShah
Thanks. I can have a look at the conflict once the PR is ready if you can't fix it. Just ensure tests pass on your side for now.
I've a branch which fixes the conflict with master but I can't push the changes to your branch. Can you give me the necessary permissions? See https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork. Otherwise you can do git fetch && git cherry-pick 998393f9d31b29a3bde77373c2fbbfd8844353cb
to apply my merge commit to your branch.
I've a branch which fixes the conflict with master but I can't push the changes to your branch. Can you give me the necessary permissions? See https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork. Otherwise you can do
git fetch && git cherry-pick 998393f9d31b29a3bde77373c2fbbfd8844353cb
to apply my merge commit to your branch.
Hi Milan, Since we have done the PR from an organisation account, GitHub doesn't permit us to allow edits from maintainers. Can we please add you as an outside collaborator to xKDR?
May I suggest we give @mousum-github commit access here, with the usual rules of working through PRs like we have everyone else?
May I suggest we give @mousum-github commit access here
This will not resolve the issue. It is the reverse that is needed (i.e. @nalimilan needs to have access to PR source repository).
Regarding commit access (as it is an independent issue).
If someone from xKDR team (or the whole team) were willing to work on improving GLM.jl in long term it would be really great. There are many open issues in GLM.jl currently + there are loads of things that could be added.
However, is that it is not the person who makes a PR, but a person who reviews it that needs commit access. Currently mostly @nalimilan serves as a reviewer on this repository (and @nalimilan - thank you for that). Of course we could change that in the future (there are many other packages that need attention by @nalimilan so having someone else oversee GLM.jl after some transition period, during which the person would learn JuliaStats standards, would be great).
Given these considerations @mousum-github - would you find having commit access to GLM.jl useful?
May I suggest we give @mousum-github commit access here
This will not resolve the issue. It is the reverse that is needed (i.e. @nalimilan needs to have access to PR source repository).
Regarding commit access (as it is an independent issue).
If someone from xKDR team (or the whole team) were willing to work on improving GLM.jl in long term it would be really great. There are many open issues in GLM.jl currently + there are loads of things that could be added.
However, is that it is not the person who makes a PR, but a person who reviews it that needs commit access. Currently mostly @nalimilan serves as a reviewer on this repository (and @nalimilan - thank you for that). Of course we could change that in the future (there are many other packages that need attention by @nalimilan so having someone else oversee GLM.jl after some transition period, during which the person would learn JuliaStats standards, would be great).
Given these considerations @mousum-github - would you find having commit access to GLM.jl useful?
Right - my suggestion was indirect - so that @mousum-github can then make future PRs using branches on this repo, and perhaps even move this repo from a fork to a branch. That would make it easier for @nalimilan and everyone to edit directly and what not. And this would also serve as a first step towards having more contributors such as @mousum-github overseeing this repo.
I am sure that the commit access will be very useful, and I am happy to help maintain the project going forward.
Thanks @ayushpatnaikgit. Not sure why GitHub doesn't support that, but thanks to the access to the xKDR repo I could push my merge commit. Tests still appear to fail though.
Usually we only give commit access to people who have made several PRs over a relatively long period, but given that xKDR is invested in improving the stats ecosystem I guess we can accelerate that process. I've just sent an invitation to @mousum-github. Help reviewing other PRs would of course be welcome!
Thanks @ayushpatnaikgit. Not sure why GitHub doesn't support that, but thanks to the access to the xKDR repo I could push my merge commit. Tests still appear to fail though.
Usually we only give commit access to people who have made several PRs over a relatively long period, but given that xKDR is invested in improving the stats ecosystem I guess we can accelerate that process. I've just sent an invitation to @mousum-github. Help reviewing other PRs would of course be welcome!
Thanks @nalimilan
Hi @nalimilan, We have updated the code and documentation.
We have updated the newly added nulldeviance
and nullloglikelihood
functions in glm
. Passed the dropcollinear
argument value to fit
function for null models.
We also added a test case to check the PosDefException
exception when dropcollinear
is false for a rank-deficient model. Unfortunately, the test case was not passed in a few systems. So, we have commented the test case for timing.
We also added a test case to check the
PosDefException
exception whendropcollinear
is false for a rank-deficient model. Unfortunately, the test case was not passed in a few systems. So, we have commented the test case for timing.
Maybe another example would pass tests more reliably? We really need to check that path. How about trying with categorical variables, like in the "rankdeficient" testset for linear models?
Okay. Let me check.
Okay. Let me check.
Added two test cases similar to rankdeficient
.