sagemaker-training-toolkit icon indicating copy to clipboard operation
sagemaker-training-toolkit copied to clipboard

SM library telemetry improvement

Open roywei opened this issue 3 years ago • 2 comments

  • add segfault error attribution
  • in creased failure reason limit to 8k(WIP at SM side, no total limit on SMTT side)
  • limit error message part of failure reason to 7K, so we have 1k characters for error name, error code, and command.

Target to go with PT1.12 release, please do not merge to any existing DLC releases.

roywei avatar Jun 23 '22 23:06 roywei

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-training-toolkit-pr
  • Commit ID: 80e140e0e47fd678c41cf42f81e3884b80711212
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 23 '22 23:06 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-training-toolkit-pr
  • Commit ID: c0ff7152d4cc4d421d56e7c31241d85acccf6baa
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Jun 24 '22 00:06 sagemaker-bot