pyGAM
pyGAM copied to clipboard
[WIP] big data GAM
fixes https://github.com/dswah/pyGAM/issues/187 https://github.com/dswah/pyGAM/issues/76 fixes https://github.com/dswah/pyGAM/issues/124
write an example like pomegranate out of core: https://pomegranate.readthedocs.io/en/latest/ooc.html
-
[X] QR updating
-
[x] documentation
-
[x] all methods avoid using full model matrix
-
[x] statistics estimation work with new pirls
-
[x] simplify statistics estimation
-
[x] gamma is a instance argument
-
[x] chunk size is instance arg
-
[x] all models inherit new behavior
-
[x] test with large dataset
-
[x] write parallel version
-
[x] ensure parallel version works in serial
-
[ ] do memory profiling. see if we can easily optimize memory anywhere
-
[x] try parallelism
-
[x] merge @maorn 'parallel' branch into this one
-
[x] logic for skipping any parallelism if
n_cores==1
joblib automatically does this -
[ ] add some tests for the new features
-
[ ] fix a couple of broken tests
-
[ ] figure out looping in partial_dependence...
-
[ ] get rid of matrix vs ndarray warnings
subsequent PR?
- [x] use joblib with Pool? (this will enable use of dask)
- [x] use
batch_size
instead ofblock_size
- [ ] enable mini-batches, add
batches_per_epoch
parameter andpartial_fit
method
Codecov Report
:exclamation: No coverage uploaded for pull request base (
master@b986ec5
). Click here to learn what that means. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #188 +/- ##
=========================================
Coverage ? 91.33%
=========================================
Files ? 19
Lines ? 2492
Branches ? 0
=========================================
Hits ? 2276
Misses ? 216
Partials ? 0
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update b986ec5...03898ea. Read the comment docs.
awesome!!!! just tried a dataset that crashes my notebook when no partitioning is used, but that correctly solves when the optimization is incremental!!!!!
great 😊,
i will convert you for loop into parallel one during the weekend
Maor
From: daniel servén [email protected] Sent: Tuesday, July 24, 2018 12:56:40 PM To: dswah/pyGAM Cc: Subscribed Subject: Re: [dswah/pyGAM] [WIP] big data GAM (#188)
awesome!!!! just tried a dataset that crashes my notebook when no partitioning is used, but that correctly solves when the optimization is incremental!!!!!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/dswah/pyGAM/pull/188#issuecomment-407350611, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASBgn3ToKBMHlrWCFpfXxI-UZ-UnUuSvks5uJu9YgaJpZM4VaF7t.
hi,
i have changed the code now it should work in parallel,
i cannot push it into the branch, can you give me access ?
Regards,
MAor
From: Maor Nissan Sent: Tuesday, July 24, 2018 1:36:02 PM To: dswah/pyGAM; dswah/pyGAM Cc: Subscribed Subject: Re: [dswah/pyGAM] [WIP] big data GAM (#188)
great 😊,
i will convert you for loop into parallel one during the weekend
Maor
From: daniel servén [email protected] Sent: Tuesday, July 24, 2018 12:56:40 PM To: dswah/pyGAM Cc: Subscribed Subject: Re: [dswah/pyGAM] [WIP] big data GAM (#188)
awesome!!!! just tried a dataset that crashes my notebook when no partitioning is used, but that correctly solves when the optimization is incremental!!!!!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/dswah/pyGAM/pull/188#issuecomment-407350611, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASBgn3ToKBMHlrWCFpfXxI-UZ-UnUuSvks5uJu9YgaJpZM4VaF7t.
@maorn that is really cool!
to contribute your code, please do the following:
- put your changes in a safe place
- fork the repo, and clone your fork on your computer
- commit your changes (ie parallel code into pygam.py)
- push your changes to your remote repo fork
- open a pull request from your remote repo to this branch
Attention!! please make sure that you dont lose the code you've already written!
- copy it or something before forking/cloning...
looking forward to reading your code :)
hi, what is the state of this branch? is there anything missing on my hand for committing it to the master branch?
hi @maorn! i think there are still a couple of things we need to do before we merge:
- [x] a rebase of your 'parallel' branch off of this one
- [x] logic for skipping any parallelism if
n_cores==1
- [ ] logic for partial dependence and quantiles that uses the new features
- [ ] add some tests for the new features
- [ ] fix a couple of broken tests
Hi @maorn and @dswah, may I know about the status of this work? do you plan to merge it into master?
@mohsenzabihi @ccurro The plan is to merge this branch into master in August.
But it needs a little love right now. Specifically, i need to
- adapt all remaining ocurrences of
gam._modelmat
like in partial dependence and quantiles to use the new blockwise scheme - remove joblib for now since it doesn't look like we get any benefit from parallelizing linear algebra operations
I know this PR is pretty old, but I'd still be really happy to see this functionality implemented. Figured I'd just mention it since it's been a couple of years since there's been any updates.