PiPPy icon indicating copy to clipboard operation
PiPPy copied to clipboard

Integration with DCP

Open LucasLLC opened this issue 11 months ago • 3 comments

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Testing out some Checkpointing code .

PR description is WIP

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ x] New feature (non-breaking change which adds functionality)
  • [ ] This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced. Please also list any relevant details for your test configuration.

  • [ ] Test A Logs for Test A

  • [ ] Test B Logs for Test B

Checklist:

  • [ ] Have you added tests that prove your fix is effective or that this feature works?
  • [ ] Has code been commented, particularly in hard-to-understand areas?
  • [ ] Have you made corresponding changes to the documentation?

LucasLLC avatar Mar 18 '24 22:03 LucasLLC

Thanks for making it work! Quick comment: Do you mind creating a dedicated example for DCP + PP? You can copy the model out (we plan to build a "model hub" for tests, so that would solve the duplicated code problem).

kwen2501 avatar Mar 19 '24 17:03 kwen2501

What's our plan for this PR? @LucasLLC I think we are pretty close to the destination. Would the following next steps be reasonable?

  1. Move the example to examples/checkpoint, and name it pippy_dcp.py.
  2. Focus on Option 1 (per-stage saving), and clean up the UI. (See comments)
  3. Make the example runnable in a multi-process setting. Today it saves the stages in a for loop, would be nice if multiple ranks can do their saving job simultaneously.

kwen2501 avatar Mar 27 '24 02:03 kwen2501

For code quality checks, please run:

./format.sh
./check.sh

kwen2501 avatar Mar 27 '24 02:03 kwen2501