sagemaker-tensorflow-training-toolkit icon indicating copy to clipboard operation
sagemaker-tensorflow-training-toolkit copied to clipboard

Toolkit for running TensorFlow training scripts on SageMaker. Dockerfiles used for building SageMaker TensorFlow Containers are at https://github.com/aws/deep-learning-containers.

Results 10 sagemaker-tensorflow-training-toolkit issues
Sort by recently updated
recently updated
newest added
trafficstars

*Description of changes:* See title By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Trying to deploy a custom Word2Vec model that I've trained offline as a SageMaker endpoint. Followed the documentation - https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/scikit_bring_your_own to create docker file and everything. I've added the following...

type: question

Hi, This is my first time working with Sagemaker. I successfully trained a model, however, I'm having difficulty getting it to output evaluation metrics to the log files. Here is...

type: question

For strategies like Multi Worker Mirrored-Strategy TF2 requires us to configure each node individually (https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras#multi-worker_configuration). Currently SageMaker does not provide us a way of doing this while trying to launch...

type: question

*Issue #, if available:* *Description of changes:* By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Raising here (where I believe the implementation is?) as opposed to on SageMaker SDK - which as I understand just documents the functionality. **Every other** SageMaker framework container that I've...

type: enhancement

Test integration ``` pytest test/integration/sagemaker/test_horovod.py --docker-base-name sm-tf-horovod-integration --tag latest --framework-version 1.15.0 --processor gpu ``` Error stacktrace: ``` sagemaker.exceptions.UnexpectedStatusException: Error for Training job test-tf-horovod-1591768266-74da: Failed. Reason: Alg orithmError: ExecuteUserScriptError: E Command...

type: question

I am trying to get to the bottom of a [problem #413](https://github.com/aws/sagemaker-python-sdk/issues/413) causing my deployed tensorflow model to fail. The model is a simple and deploys with basic instructions to...

# Patching CVE-2007-4559 Hi, we are security researchers from the Advanced Research Center at [Trellix](https://www.trellix.com). We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a...

*Issue #, if available:* *Description of changes:* Update documentation By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.