giotto-deep icon indicating copy to clipboard operation
giotto-deep copied to clipboard

Parallel training

Open yorickbrunet opened this issue 1 year ago β€’ 16 comments

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [] New feature (non-breaking change which adds functionality)
  • [x] Breaking change (fix or feature that would cause existing functionality to change)

Description This work introduces the parallelisation of giotto-deep on multiple GPUs via two methods: pytorch's FSDP and pipeline tools.

The version of pytorch is increased to 1.13.1 to support the necessary features of FSDP.

To allow the parallelisation to be efficiently run, a new sampler GiottoSampler is defined that combines DistributedSampler and SubsetRandomSampler.

A benchmark tool allows running a model with different batch sizes on different GPUs and number of GPUs to compare the parallelised and not-parallelised runs. A generator of Kubernetes pods takes some GKE details as input and outputs the pod configurations, allowing a user to build its own configurations for its own cluster.

Any other comments?

Checklist

  • [x] I have read the guidelines for contributing.
  • [ ] My code follows the code style of this project. I used flake8 to check my Python changes.
  • [x] My change requires a change to the documentation.
  • [x] I have updated the documentation accordingly.
  • [ ] I have added tests to cover my changes.
  • [x] All new and existing tests passed. I used pytest to check this on Python tests.

yorickbrunet avatar Nov 24 '23 15:11 yorickbrunet

Check out this pull request onΒ  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Nov 24 '23 15:11 CLAassistant

@yorickbrunet Thanks for your contribution. Please let me know when the PR is ready for review.

raphaelreinauer avatar Nov 28 '23 14:11 raphaelreinauer

@raphaelreinauer All the tests pass on linux containers. I suppose you can start your review. Thanks!

yorickbrunet avatar Dec 07 '23 07:12 yorickbrunet

@yorickbrunet Thanks for your contribution. Please let me know when the PR is ready for review.

Hi @yorickbrunet, thank you so much for your hard work on this! 😊 I noticed the PR has a lot of file changes - 52, in fact! To help me, could you please add a description to the PR? A bit of context makes the review process much easier for me.

raphaelreinauer avatar Dec 10 '23 10:12 raphaelreinauer

@raphaelreinauer I see that @yorickbrunet answered to all your comments. Are you satisfied or is there anything else you would like to discuss? If there is nothing more, I'll merge the PR.

matteocao avatar Dec 13 '23 19:12 matteocao

@matteocao thanks for checking in and thanks @yorickbrunet for your answers so far.

However, there's one key aspect that still needs attention - the PR description:

Hi @yorickbrunet, thank you so much for your hard work on this! 😊 I noticed the PR has a lot of file changes - 52, in fact! To help me, could you please add a description to the PR? A bit of context makes the review process much easier for me.

@yorickbrunet Could you please add a detailed description of all the features you added, then I can do a detailed PR review.

raphaelreinauer avatar Dec 13 '23 19:12 raphaelreinauer

@raphaelreinauer we answered or fixed all comments. Can you have another look, close the issues that can be closed, and maybe approve the PR? Thanks

yorickbrunet avatar Jan 09 '24 15:01 yorickbrunet

Hi @yorickbrunet, thank you so much for your hard work on this! 😊 I noticed the PR has a lot of file changes - 52, in fact! To help me, could you please add a description to the PR? A bit of context makes the review process much easier for me.

Hi @yorickbrunet, could you please provide a detailed description of the features added in this PR to aid my review process? Thanks

raphaelreinauer avatar Jan 09 '24 19:01 raphaelreinauer

Hi @yorickbrunet, thank you so much for your hard work on this! 😊 I noticed the PR has a lot of file changes - 52, in fact! To help me, could you please add a description to the PR? A bit of context makes the review process much easier for me.

Hi @yorickbrunet, could you please provide a detailed description of the features added in this PR to aid my review process? Thanks

Hi @raphaelreinauer. I improved the general description of the PR, but the important description was there: This work introduces the parallelisation of giotto-deep on multiple GPUs via two methods: pytorch's FSDP and pipeline tools. Even though there are 52 files modified, the modifications are quite small in many of them. The most interesting file is gdeep/trainer/trainer.py, where most of the modifications were done.

yorickbrunet avatar Jan 10 '24 08:01 yorickbrunet

Thanks, @yorickbrunet, for the changes. I'll review the changes by next week Friday.

raphaelreinauer avatar Jan 14 '24 15:01 raphaelreinauer

Unfortunately, I didn't have time to review the PR last weekend, but I'll do it this weekend. Sorry for the delay.

raphaelreinauer avatar Jan 23 '24 22:01 raphaelreinauer

Hey @yorickbrunet and @matteocao, I noticed that there have yet to be any comments on the PR reviews. Could you give some comments? So that we can merge this PR.

raphaelreinauer avatar Mar 03 '24 15:03 raphaelreinauer

Hey @yorickbrunet and @matteocao, I noticed that there have yet to be any comments on the PR reviews. Could you give some comments? So that we can merge this PR.

Hi @raphaelreinauer I answered all comments with either a modification of the code or some justification why it cannot/won't be modified. Basically, the project is closed and we cannot spend two more weeks adding tests, retesting the setup after modification, etc.

yorickbrunet avatar Mar 07 '24 09:03 yorickbrunet

Hey @yorickbrunet and @matteocao, I noticed that there have yet to be any comments on the PR reviews. Could you give some comments? So that we can merge this PR.

@raphaelreinauer @VascoSch92 Can you please close all comments that you consider OK? So that we know where we stand. Thanks.

yorickbrunet avatar Mar 08 '24 13:03 yorickbrunet

Hey @yorickbrunet and @matteocao, I noticed that there have yet to be any comments on the PR reviews. Could you give some comments? So that we can merge this PR.

@raphaelreinauer @VascoSch92 Can you please close all comments that you consider OK? So that we know where we stand. Thanks.

@yorickbrunet for me is good. You can resolve the discussions ;-) (i cannot)

VascoSch92 avatar Mar 08 '24 15:03 VascoSch92