sagemaker-debugger icon indicating copy to clipboard operation
sagemaker-debugger copied to clipboard

Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors

Results 92 sagemaker-debugger issues
Sort by recently updated
recently updated
newest added

``` def test_save_shapes(out_dir, hook=None): hook_created = False if hook is None: hook_created = True global_reduce_config = ReductionConfig(save_raw_tensor=True) global_save_config = SaveConfig(save_steps=[0, 1]) hook = t_hook( out_dir=out_dir, save_config=global_save_config, include_collections=[ "weights", "biases", "gradients",...

I have a simple Tensorflow model which I'm training on SageMaker. It was working fine. Recently it has started crashing while training **right after the first checkpoint is saved**. Following...

Can you please confirm if Sagemaker Debugger works with HPO. I get errors when the code that works perfectly fine with SM script mode fails when extended to HPO. `...

question
FAQ

Identify datasets used and cache them in s3 https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data into /root/tensorflow_datasets/downloads/

Pytest has been temporarily pinned due to stability issues with their latest release. Track the issue and revert the version pin when the issue is resolved. The issue is being...

Observing this at the end of a run of `pytest tests/tensorflow2`. Issue most probably introduced after PR https://github.com/awslabs/sagemaker-debugger/pull/259 ``` --- Logging error --- Traceback (most recent call last): File "/usr/lib/python3.6/logging/__init__.py",...

As a follow up to PR https://github.com/awslabs/sagemaker-debugger/pull/279, add support for testing Horovod when TF 2.3 is released

This log line confuses customers. Let's remove it. [2020-06-20 21:49:37.529 algo-1:67 INFO utils.py:25] The end of training job file will not be written for jobs running under SageMaker.