sagemaker-debugger icon indicating copy to clipboard operation
sagemaker-debugger copied to clipboard

Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors

Results 92 sagemaker-debugger issues
Sort by recently updated
recently updated
newest added

https://github.com/awslabs/sagemaker-debugger/blob/9d2d0c3fe9c7745d532e6ec1daae9e6c094394ca/smdebug/core/logger.py#L51 I am getting this error in any kind of PyTorch training job ![Screen Shot 2021-01-08 at 7 36 22 PM](https://user-images.githubusercontent.com/1638254/104079586-88d7b580-51e9-11eb-81de-693aa0e96748.png)

I am utilising debugger hooks for custom model utilising **keras-unet** library. This library is built over tensorflow and keras.I am trying to include a debugger hook for that and I...

I have a working pytorch training script, which runs on my local machine. I'm trying to set it up to run on Sagemaker. I want to save data to TensorBoard,...

I am using a custom docker image to run distributed training with PyTorch on SageMaker. The training script is taken from https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Segmentation/MaskRCNN. The DLC Image uses `pytorch-training:1.6.0-gpu-py3` as the base...

### Description of changes: - This PR is in the draft stage, I need to update tests, refactor and add comments. - Adds ability to save nested layers with the...

### Description of bug: - Users can wrap their modules with helper classes like `DataParallelCriterion` or `DataParallel` ``` custom_loss_module = CustomLossModule() parallel_custom_loss_module = DataParallelCriterion(custom_loss_module) ``` - The smdebug hook register...

### Description of changes: #### Style and formatting: I have run `pre-commit install` to ensure that auto-formatting happens with every commit. #### Issue number, if available By submitting this pull...

### Description of changes: #### Style and formatting: I have run `pre-commit install` to ensure that auto-formatting happens with every commit. #### Issue number, if available By submitting this pull...

### Description of changes: - If the customer over-writes the forward pass function of a Loss module then ZCC has no effect. Note: The first commit only reproduces the error...