amazon-sagemaker-examples icon indicating copy to clipboard operation
amazon-sagemaker-examples copied to clipboard

Smddp MNIST example update

Open apoorvtintin opened this issue 2 years ago • 24 comments

Issue #, if available:

Description of changes: Updated SMDDP MNIST training example with new APIs and information

Testing done: Yes, tested on Sagemaker

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • [x] I have read the CONTRIBUTING doc and adhered to the example notebook best practices
  • [x] I have updated any necessary documentation, including READMEs
  • [x] I have tested my notebook(s) and ensured it runs end-to-end
  • [x] I have linted my notebook(s) and code using tox -e black-format,black-nb-format

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

apoorvtintin avatar Jun 07 '22 20:06 apoorvtintin

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: bc9e9d5853c1d448b04995d877955084b5ec98b6
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 07 '22 20:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: bc9e9d5853c1d448b04995d877955084b5ec98b6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 07 '22 20:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: bc9e9d5853c1d448b04995d877955084b5ec98b6
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 07 '22 20:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: bc9e9d5853c1d448b04995d877955084b5ec98b6
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 07 '22 20:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: bc9e9d5853c1d448b04995d877955084b5ec98b6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 07 '22 20:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: bc9e9d5853c1d448b04995d877955084b5ec98b6
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 07 '22 20:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: amazon-sagemaker-examples-pr
  • Commit ID: bc9e9d5853c1d448b04995d877955084b5ec98b6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 07 '22 21:06 sagemaker-bot

@mchoi8739 @jkroll-aws - Requesting your review and help with merge on this PR please. Thank you.

cc @Zha0q1

sandeep-krishnamurthy avatar Jun 08 '22 17:06 sandeep-krishnamurthy

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: 045f5c5e9602720d0fe43d95b02cfe441a0b7745
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 09 '22 20:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: 045f5c5e9602720d0fe43d95b02cfe441a0b7745
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 09 '22 20:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: 045f5c5e9602720d0fe43d95b02cfe441a0b7745
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 09 '22 20:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: 045f5c5e9602720d0fe43d95b02cfe441a0b7745
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 09 '22 20:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: 045f5c5e9602720d0fe43d95b02cfe441a0b7745
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 09 '22 20:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: 045f5c5e9602720d0fe43d95b02cfe441a0b7745
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 09 '22 20:06 sagemaker-bot

@apoorvtintin Thanks for the PR! Code changes look good to me! Would you help update the notebook too? There's some outdated info e.g.

1. "This notebook example shows how to use smdistributed.dataparallel with PyTorch in SageMaker using MNIST dataset." (introduce smddp backend explicitly)

2. "The [data parallel](https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel.html) feature in this library is a distributed data parallel training framework for PyTorch, TensorFlow, and MXNet. (remove mxnet)"

3. "The training script is very similar to a PyTorch training script you might run outside of SageMaker, but modified to run with the smdistributed.dataparallel library. This library's PyTorch client provides an alternative to PyTorch's native DDP." (we DO use pytorch native ddp now)
   etc.

Thanks for the review and comments @Zha0q1 and @mchoi8739 , I have updated the vocabulary for the notebook to reflect our new smddp backend. Please let me know if any more changes are needed.

apoorvtintin avatar Jun 09 '22 20:06 apoorvtintin

AWS CodeBuild CI Report

  • CodeBuild project: amazon-sagemaker-examples-pr
  • Commit ID: 045f5c5e9602720d0fe43d95b02cfe441a0b7745
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 09 '22 20:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: d8b07763d4c244a5211e57ee0892328ff7839e3b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 13 '22 17:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: d8b07763d4c244a5211e57ee0892328ff7839e3b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 13 '22 17:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: d8b07763d4c244a5211e57ee0892328ff7839e3b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 13 '22 17:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: d8b07763d4c244a5211e57ee0892328ff7839e3b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 13 '22 17:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: d8b07763d4c244a5211e57ee0892328ff7839e3b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 13 '22 17:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: d8b07763d4c244a5211e57ee0892328ff7839e3b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 13 '22 17:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: amazon-sagemaker-examples-pr
  • Commit ID: d8b07763d4c244a5211e57ee0892328ff7839e3b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 13 '22 17:06 sagemaker-bot