amazon-sagemaker-examples icon indicating copy to clipboard operation
amazon-sagemaker-examples copied to clipboard

p3.16xlarge vs p3dn.24xlarge vs p4d.24xlarge with updated hyperparame…

Open mlonaws opened this issue 3 years ago • 8 comments

p4d instance provides 21.9% decrease in training time compare to p3dn.

Description of changes:

Hyperparameters Updated:

  • activation_checkpointing = True (1)
  • active_microbatches = 4
  • microbatches = 4
  • max_steps = 80

Testing done:

Option 1: Instance type and count

  • instance_type = "ml.p3.16xlarge"
  • instance_count = 2

Training failed (CUDA out of memory)

Option 2: Instance type and count

  • instance_type = "ml.p3dn.24xlarge"
  • instance_count = 1

Training successful

Option 3: Instance type and count

  • instance_type = "ml.p4d.24xlarge"
  • instance_count = 1

Training successful and training time decreases by 21.9% compare to ml.p3dn.24xlarge instance.

Issue #, if available:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • [ ] I have read the CONTRIBUTING doc and adhered to the example notebook best practices
  • [ ] I have updated any necessary documentation, including READMEs
  • [ ] I have tested my notebook(s) and ensured it runs end-to-end
  • [ ] I have linted my notebook(s) and code using tox -e black-format,black-nb-format

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

mlonaws avatar Apr 11 '22 07:04 mlonaws

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: 4a47e4d3530a00b510a5d4ce4ce96b1de370a779
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Apr 11 '22 07:04 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: 4a47e4d3530a00b510a5d4ce4ce96b1de370a779
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Apr 11 '22 07:04 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: 4a47e4d3530a00b510a5d4ce4ce96b1de370a779
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Apr 11 '22 07:04 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: 4a47e4d3530a00b510a5d4ce4ce96b1de370a779
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Apr 11 '22 07:04 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: 4a47e4d3530a00b510a5d4ce4ce96b1de370a779
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Apr 11 '22 07:04 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: 4a47e4d3530a00b510a5d4ce4ce96b1de370a779
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Apr 11 '22 07:04 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: amazon-sagemaker-examples-pr
  • Commit ID: 4a47e4d3530a00b510a5d4ce4ce96b1de370a779
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Apr 11 '22 07:04 sagemaker-bot