pyGAM icon indicating copy to clipboard operation
pyGAM copied to clipboard

[WIP] big data GAM

Open dswah opened this issue 6 years ago • 10 comments

fixes https://github.com/dswah/pyGAM/issues/187 https://github.com/dswah/pyGAM/issues/76 fixes https://github.com/dswah/pyGAM/issues/124

write an example like pomegranate out of core: https://pomegranate.readthedocs.io/en/latest/ooc.html

  • [X] QR updating

  • [x] documentation

  • [x] all methods avoid using full model matrix

  • [x] statistics estimation work with new pirls

  • [x] simplify statistics estimation

  • [x] gamma is a instance argument

  • [x] chunk size is instance arg

  • [x] all models inherit new behavior

  • [x] test with large dataset

  • [x] write parallel version

  • [x] ensure parallel version works in serial

  • [ ] do memory profiling. see if we can easily optimize memory anywhere

  • [x] try parallelism

  • [x] merge @maorn 'parallel' branch into this one

  • [x] logic for skipping any parallelism if n_cores==1 joblib automatically does this

  • [ ] add some tests for the new features

  • [ ] fix a couple of broken tests

  • [ ] figure out looping in partial_dependence...

  • [ ] get rid of matrix vs ndarray warnings

subsequent PR?

  • [x] use joblib with Pool? (this will enable use of dask)
  • [x] use batch_size instead of block_size
  • [ ] enable mini-batches, add batches_per_epoch parameter and partial_fit method

memory profile

dswah avatar Jul 22 '18 18:07 dswah

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@b986ec5). Click here to learn what that means. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #188   +/-   ##
=========================================
  Coverage          ?   91.33%           
=========================================
  Files             ?       19           
  Lines             ?     2492           
  Branches          ?        0           
=========================================
  Hits              ?     2276           
  Misses            ?      216           
  Partials          ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b986ec5...03898ea. Read the comment docs.

codecov[bot] avatar Jul 23 '18 23:07 codecov[bot]

awesome!!!! just tried a dataset that crashes my notebook when no partitioning is used, but that correctly solves when the optimization is incremental!!!!!

dswah avatar Jul 24 '18 09:07 dswah

great 😊,

i will convert you for loop into parallel one during the weekend

Maor


From: daniel servén [email protected] Sent: Tuesday, July 24, 2018 12:56:40 PM To: dswah/pyGAM Cc: Subscribed Subject: Re: [dswah/pyGAM] [WIP] big data GAM (#188)

awesome!!!! just tried a dataset that crashes my notebook when no partitioning is used, but that correctly solves when the optimization is incremental!!!!!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/dswah/pyGAM/pull/188#issuecomment-407350611, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASBgn3ToKBMHlrWCFpfXxI-UZ-UnUuSvks5uJu9YgaJpZM4VaF7t.

maorn avatar Jul 24 '18 10:07 maorn

hi,

i have changed the code now it should work in parallel,

i cannot push it into the branch, can you give me access ?

Regards,

MAor


From: Maor Nissan Sent: Tuesday, July 24, 2018 1:36:02 PM To: dswah/pyGAM; dswah/pyGAM Cc: Subscribed Subject: Re: [dswah/pyGAM] [WIP] big data GAM (#188)

great 😊,

i will convert you for loop into parallel one during the weekend

Maor


From: daniel servén [email protected] Sent: Tuesday, July 24, 2018 12:56:40 PM To: dswah/pyGAM Cc: Subscribed Subject: Re: [dswah/pyGAM] [WIP] big data GAM (#188)

awesome!!!! just tried a dataset that crashes my notebook when no partitioning is used, but that correctly solves when the optimization is incremental!!!!!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/dswah/pyGAM/pull/188#issuecomment-407350611, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASBgn3ToKBMHlrWCFpfXxI-UZ-UnUuSvks5uJu9YgaJpZM4VaF7t.

maorn avatar Jul 26 '18 14:07 maorn

@maorn that is really cool!

to contribute your code, please do the following:

  • put your changes in a safe place
  • fork the repo, and clone your fork on your computer
  • commit your changes (ie parallel code into pygam.py)
  • push your changes to your remote repo fork
  • open a pull request from your remote repo to this branch

Attention!! please make sure that you dont lose the code you've already written!

  • copy it or something before forking/cloning...

looking forward to reading your code :)

dswah avatar Jul 26 '18 14:07 dswah

hi, what is the state of this branch? is there anything missing on my hand for committing it to the master branch?

maorn avatar Oct 21 '18 08:10 maorn

hi @maorn! i think there are still a couple of things we need to do before we merge:

  • [x] a rebase of your 'parallel' branch off of this one
  • [x] logic for skipping any parallelism if n_cores==1
  • [ ] logic for partial dependence and quantiles that uses the new features
  • [ ] add some tests for the new features
  • [ ] fix a couple of broken tests

dswah avatar Oct 21 '18 10:10 dswah

Hi @maorn and @dswah, may I know about the status of this work? do you plan to merge it into master?

mohsenzabihi avatar May 17 '19 13:05 mohsenzabihi

@mohsenzabihi @ccurro The plan is to merge this branch into master in August.

But it needs a little love right now. Specifically, i need to

  • adapt all remaining ocurrences of gam._modelmat like in partial dependence and quantiles to use the new blockwise scheme
  • remove joblib for now since it doesn't look like we get any benefit from parallelizing linear algebra operations

dswah avatar Jul 16 '19 15:07 dswah

I know this PR is pretty old, but I'd still be really happy to see this functionality implemented. Figured I'd just mention it since it's been a couple of years since there's been any updates.

tjburch avatar Feb 28 '22 14:02 tjburch