GLM.jl icon indicating copy to clipboard operation
GLM.jl copied to clipboard

Implementation of `dropcollinear` feature in `GeneralizedLinearModel`

Open mousum-github opened this issue 2 years ago • 2 comments

This pull request supersedes #340

We are looking for the dropcollinear feature in GLM and found #340. The PR, which opened in Oct'2019, looks very close to complete.

Apart from the changes covered in PR 340, we have added

  • A test case to compare LM with equivalent GLM
  • Two test cases to match output with R
  • Another implementation of dof for CholeskyPivoted since the dof will be calculated based on rank, instead of the number of coefficients.
  • update API documentation: Added dropcollinear in the list of keyword arguments.

We have performed logistic regression with 10 independent variables, 10,000 rows and dropcollinear feature true and false; the dataset does not have a multicollinearity issue. The machine configuration is as follows: - OS – Linux, CPU - AMD Ryzen 5 3600 6-Core Processor RAM – 64 GB

With dropcollinear = true: average time 4.381 ms With dropcollinear = false: average time 4.396 ms

mousum-github avatar Jul 15 '22 17:07 mousum-github

Codecov Report

Base: 87.08% // Head: 87.32% // Increases project coverage by +0.24% :tada:

Coverage data is based on head (07e44de) compared to base (e2e9c85). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #488      +/-   ##
==========================================
+ Coverage   87.08%   87.32%   +0.24%     
==========================================
  Files           7        7              
  Lines         929      947      +18     
==========================================
+ Hits          809      827      +18     
  Misses        120      120              
Impacted Files Coverage Δ
src/glmfit.jl 81.01% <100.00%> (+0.10%) :arrow_up:
src/linpred.jl 85.61% <100.00%> (+2.00%) :arrow_up:
src/lm.jl 93.33% <100.00%> (-0.05%) :arrow_down:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov-commenter avatar Jul 15 '22 18:07 codecov-commenter

@mousum-github Can you fix the failing doctests?

ViralBShah avatar Sep 20 '22 19:09 ViralBShah

Hi @nalimilan, Covered most of your suggestions. But there is a conflict in glmfit.jl file and unable check whether all checks are passed or not. Sorry for the delay in response. cc: @ViralBShah

mousum-github avatar Oct 02 '22 02:10 mousum-github

Thanks. I can have a look at the conflict once the PR is ready if you can't fix it. Just ensure tests pass on your side for now.

nalimilan avatar Oct 02 '22 20:10 nalimilan

I've a branch which fixes the conflict with master but I can't push the changes to your branch. Can you give me the necessary permissions? See https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork. Otherwise you can do git fetch && git cherry-pick 998393f9d31b29a3bde77373c2fbbfd8844353cb to apply my merge commit to your branch.

nalimilan avatar Oct 08 '22 12:10 nalimilan

I've a branch which fixes the conflict with master but I can't push the changes to your branch. Can you give me the necessary permissions? See https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork. Otherwise you can do git fetch && git cherry-pick 998393f9d31b29a3bde77373c2fbbfd8844353cb to apply my merge commit to your branch.

Hi Milan, Since we have done the PR from an organisation account, GitHub doesn't permit us to allow edits from maintainers. Can we please add you as an outside collaborator to xKDR?

ayushpatnaikgit avatar Oct 09 '22 10:10 ayushpatnaikgit

May I suggest we give @mousum-github commit access here, with the usual rules of working through PRs like we have everyone else?

ViralBShah avatar Oct 09 '22 15:10 ViralBShah

May I suggest we give @mousum-github commit access here

This will not resolve the issue. It is the reverse that is needed (i.e. @nalimilan needs to have access to PR source repository).


Regarding commit access (as it is an independent issue).

If someone from xKDR team (or the whole team) were willing to work on improving GLM.jl in long term it would be really great. There are many open issues in GLM.jl currently + there are loads of things that could be added.

However, is that it is not the person who makes a PR, but a person who reviews it that needs commit access. Currently mostly @nalimilan serves as a reviewer on this repository (and @nalimilan - thank you for that). Of course we could change that in the future (there are many other packages that need attention by @nalimilan so having someone else oversee GLM.jl after some transition period, during which the person would learn JuliaStats standards, would be great).

Given these considerations @mousum-github - would you find having commit access to GLM.jl useful?

bkamins avatar Oct 09 '22 21:10 bkamins

May I suggest we give @mousum-github commit access here

This will not resolve the issue. It is the reverse that is needed (i.e. @nalimilan needs to have access to PR source repository).

Regarding commit access (as it is an independent issue).

If someone from xKDR team (or the whole team) were willing to work on improving GLM.jl in long term it would be really great. There are many open issues in GLM.jl currently + there are loads of things that could be added.

However, is that it is not the person who makes a PR, but a person who reviews it that needs commit access. Currently mostly @nalimilan serves as a reviewer on this repository (and @nalimilan - thank you for that). Of course we could change that in the future (there are many other packages that need attention by @nalimilan so having someone else oversee GLM.jl after some transition period, during which the person would learn JuliaStats standards, would be great).

Given these considerations @mousum-github - would you find having commit access to GLM.jl useful?

Right - my suggestion was indirect - so that @mousum-github can then make future PRs using branches on this repo, and perhaps even move this repo from a fork to a branch. That would make it easier for @nalimilan and everyone to edit directly and what not. And this would also serve as a first step towards having more contributors such as @mousum-github overseeing this repo.

ViralBShah avatar Oct 09 '22 21:10 ViralBShah

I am sure that the commit access will be very useful, and I am happy to help maintain the project going forward.

mousum-github avatar Oct 10 '22 01:10 mousum-github

Thanks @ayushpatnaikgit. Not sure why GitHub doesn't support that, but thanks to the access to the xKDR repo I could push my merge commit. Tests still appear to fail though.

Usually we only give commit access to people who have made several PRs over a relatively long period, but given that xKDR is invested in improving the stats ecosystem I guess we can accelerate that process. I've just sent an invitation to @mousum-github. Help reviewing other PRs would of course be welcome!

nalimilan avatar Oct 10 '22 06:10 nalimilan

Thanks @ayushpatnaikgit. Not sure why GitHub doesn't support that, but thanks to the access to the xKDR repo I could push my merge commit. Tests still appear to fail though.

Usually we only give commit access to people who have made several PRs over a relatively long period, but given that xKDR is invested in improving the stats ecosystem I guess we can accelerate that process. I've just sent an invitation to @mousum-github. Help reviewing other PRs would of course be welcome!

Thanks @nalimilan

mousum-github avatar Oct 10 '22 12:10 mousum-github

Hi @nalimilan, We have updated the code and documentation.

We have updated the newly added nulldeviance and nullloglikelihood functions in glm. Passed the dropcollinear argument value to fit function for null models.

We also added a test case to check the PosDefException exception when dropcollinear is false for a rank-deficient model. Unfortunately, the test case was not passed in a few systems. So, we have commented the test case for timing.

mousum-github avatar Oct 17 '22 19:10 mousum-github

We also added a test case to check the PosDefException exception when dropcollinear is false for a rank-deficient model. Unfortunately, the test case was not passed in a few systems. So, we have commented the test case for timing.

Maybe another example would pass tests more reliably? We really need to check that path. How about trying with categorical variables, like in the "rankdeficient" testset for linear models?

nalimilan avatar Oct 17 '22 20:10 nalimilan

Okay. Let me check.

mousum-github avatar Oct 17 '22 21:10 mousum-github

Okay. Let me check.

Added two test cases similar to rankdeficient.

mousum-github avatar Oct 17 '22 21:10 mousum-github