amazon-sagemaker-examples
amazon-sagemaker-examples copied to clipboard
Added examples for Distributed Data Parallel (DDP/SMDDP) training with PyTorch Lightning on Sagemaker.
AWS SageMaker now supports PyTorch training (single node && distributed) using Lightning (https://pytorch-lightning.readthedocs.io/en/stable/). The blogpost with the announcement will be amended to this description once it has been released. In this change, we add examples of the variants of executing single/multi node training using Lightning and particularly for the SMDDP backend https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-intro.html).
Issue #, if available: None, the changes here are new self sufficient examples which don't interfere with the existing examples.
Description of changes:
The examples added demonstrate the following scenarios.
- ipython notebook added to demonstrate the execution of single node (different accelerator - CPU/GPU) based training.
- 6 distributed training jobs executed via a launcher script, which demonstrate how to execute multinode training with the DDP/SMDDP backends for the MNIST&BERT models. Examples are provided for both Strategy && Plugin architectures used in latest Lightning and well as older Lightning (1.5.10) which remains to be used by customers. DDPPlugin: https://github.com/Lightning-AI/lightning/blob/1.5.10/pytorch_lightning/plugins/training_type/ddp.py#L78 ; DDPStrategy: https://github.com/Lightning-AI/lightning/blob/master/src/pytorch_lightning/strategies/ddp.py#L79
Testing done:
In this changes, 7 examples have been added. All of them have been tested on the 570106654206 AWS account. The following are references to successful job executions. Moreover, the linter was executed to validate the stylistic content of the python notebook.
- ddp plugin mnist: https://us-west-2.console.aws.amazon.com/sagemaker/home?region=us-west-2#/jobs/lightning-ddp-plugin-2022-08-18-11-19-30-380
- ddp strategy bert: https://us-west-2.console.aws.amazon.com/sagemaker/home?region=us-west-2#/jobs/lightning-ddp-strategy-bert-2022-08-18-14-37-31-390
- ddp strategy mnist: https://us-west-2.console.aws.amazon.com/sagemaker/home?region=us-west-2#/jobs/lightning-ddp-strategy-mnist-2022-08-18-14-50-28-999
- smddp plugin mnist: https://us-west-2.console.aws.amazon.com/sagemaker/home?region=us-west-2#/jobs/lightning-smddp-plugin-mnist-2022-08-18-15-12-27-201
- smddp strategy bert: https://us-west-2.console.aws.amazon.com/sagemaker/home?region=us-west-2#/jobs/lightning-smddp-strategy-bert-2022-08-18-17-14-27-434
- smddp strategy mnist: https://us-west-2.console.aws.amazon.com/sagemaker/home?region=us-west-2#/jobs/lightning-smddp-strategy-mnist-2022-08-18-15-32-52-541
- single node mnist: Validated correctness in https://dhimank-dev.notebook.us-west-2.sagemaker.aws/notebooks/lightning/pytorch-lightning-mnist-single-node.ipynb
Merge Checklist
Put an x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.
- [x] I have read the CONTRIBUTING doc and adhered to the example notebook best practices
- [x] I have updated any necessary documentation, including READMEs
- [x] I have tested my notebook(s) and ensured it runs end-to-end
- [x] I have linted my notebook(s) and code using
tox -e black-format,black-nb-format
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Check out this pull request onÂ
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-code-formatting
- Commit ID: d74966e91a3dedc8864ba3c39e775f674cc58803
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-link-check
- Commit ID: d74966e91a3dedc8864ba3c39e775f674cc58803
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-code-formatting
- Commit ID: d74966e91a3dedc8864ba3c39e775f674cc58803
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-code-formatting
- Commit ID: d74966e91a3dedc8864ba3c39e775f674cc58803
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-link-check
- Commit ID: d74966e91a3dedc8864ba3c39e775f674cc58803
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-link-check
- Commit ID: d74966e91a3dedc8864ba3c39e775f674cc58803
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-grammar
- Commit ID: d74966e91a3dedc8864ba3c39e775f674cc58803
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-grammar
- Commit ID: d74966e91a3dedc8864ba3c39e775f674cc58803
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-grammar
- Commit ID: d74966e91a3dedc8864ba3c39e775f674cc58803
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: amazon-sagemaker-examples-pr
- Commit ID: d74966e91a3dedc8864ba3c39e775f674cc58803
- Result: FAILED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-link-check
- Commit ID: cc922b45fa372695ae1a76ea79cc0cf11c6d30e6
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-grammar
- Commit ID: cc922b45fa372695ae1a76ea79cc0cf11c6d30e6
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-code-formatting
- Commit ID: cc922b45fa372695ae1a76ea79cc0cf11c6d30e6
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: amazon-sagemaker-examples-pr
- Commit ID: cc922b45fa372695ae1a76ea79cc0cf11c6d30e6
- Result: FAILED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-code-formatting
- Commit ID: 50ce44604cded8d2553697ba1ca7d00547b813f2
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-grammar
- Commit ID: 50ce44604cded8d2553697ba1ca7d00547b813f2
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-link-check
- Commit ID: 50ce44604cded8d2553697ba1ca7d00547b813f2
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: amazon-sagemaker-examples-pr
- Commit ID: 50ce44604cded8d2553697ba1ca7d00547b813f2
- Result: FAILED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-code-formatting
- Commit ID: d965052e190638ce41f5edfc9032a16b7cfeb207
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-grammar
- Commit ID: d965052e190638ce41f5edfc9032a16b7cfeb207
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-link-check
- Commit ID: d965052e190638ce41f5edfc9032a16b7cfeb207
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: amazon-sagemaker-examples-pr
- Commit ID: d965052e190638ce41f5edfc9032a16b7cfeb207
- Result: FAILED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-link-check
- Commit ID: 757e8d0cb3f3bb6bc69b0325e3d8483de02593a5
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-code-formatting
- Commit ID: 757e8d0cb3f3bb6bc69b0325e3d8483de02593a5
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-grammar
- Commit ID: 757e8d0cb3f3bb6bc69b0325e3d8483de02593a5
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: amazon-sagemaker-examples-pr
- Commit ID: 757e8d0cb3f3bb6bc69b0325e3d8483de02593a5
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-link-check
- Commit ID: 226345d23d9f821718f4b04a5b2ff6b5ba189128
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-code-formatting
- Commit ID: 226345d23d9f821718f4b04a5b2ff6b5ba189128
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: sagemaker-examples-grammar
- Commit ID: 226345d23d9f821718f4b04a5b2ff6b5ba189128
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: amazon-sagemaker-examples-pr
- Commit ID: 226345d23d9f821718f4b04a5b2ff6b5ba189128
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository