sagemaker-pytorch-training-toolkit icon indicating copy to clipboard operation
sagemaker-pytorch-training-toolkit copied to clipboard

Toolkit for running PyTorch training scripts on SageMaker. Dockerfiles used for building SageMaker Pytorch Containers are at https://github.com/aws/deep-learning-containers.

Results 12 sagemaker-pytorch-training-toolkit issues
Sort by recently updated
recently updated
newest added

Hello everyone, I'm very new on sagemaker and I'm facing a strange issue that I can't solve. **My goal** : I have created a CNN that I would like to...

**BUG Description** I'm trying to automate and scale a large collection of experiments using AWS SageMamker via Python SDK. However, I am facing an error that does not give any...

**BUG Description** **I am facing an error that does not give any direction to resolve it when migrating to run on Sagemaker.** The code runs perfectly on the local machine....

# Patching CVE-2007-4559 Hi, we are security researchers from the Advanced Research Center at [Trellix](https://www.trellix.com). We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a...

**Describe the bug** Torch does not find Cuda on GPU instance and official SageMaker training container **To reproduce** ``` sudo docker pull 763104351884.dkr.ecr.eu-west-2.amazonaws.com/pytorch-training:1.10.0-gpu-py38-cu113-ubuntu20.04-sagemaker sudo docker run -it --entrypoint /bin/bash 709fa9395949...

I'm trying to install torchaudio inside the PyTorch container and run into this error. Looking at online forums indicate that multiple torch versions or CUDA issues lead to this error....

type: bug

**What did you find confusing?** In the [Dockerfile.gpu](https://github.com/aws/sagemaker-pytorch-training-toolkit/blob/master/docker/1.5.0/py3/Dockerfile.gpu), there is a point where torch and torchvision are uni-nstalled, to be replaced with the re-installed specialized version of both packages from:...

would it possible to have an example use case of this repository? Would I clone this whilst in the Sagemaker studio? Would it be possible to build an image from...

type: question
type: documentation

*Issue #, if available:* *Description of changes:* By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

The entrypoint script for the containers is executed with monitor mode enabled (using -m flag), eg. here https://github.com/aws/sagemaker-pytorch-container/blob/97e611b4cb2df13d966d508e56d1c990439b2163/docker/1.3.1/py3/Dockerfile.gpu#L166 This prints the following message at the start of any sagemaker job...

type: question