sagemaker-debugger
sagemaker-debugger copied to clipboard
Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors
https://github.com/awslabs/sagemaker-debugger/blob/9d2d0c3fe9c7745d532e6ec1daae9e6c094394ca/smdebug/core/logger.py#L51 I am getting this error in any kind of PyTorch training job 
I am utilising debugger hooks for custom model utilising **keras-unet** library. This library is built over tensorflow and keras.I am trying to include a debugger hook for that and I...
I have a working pytorch training script, which runs on my local machine. I'm trying to set it up to run on Sagemaker. I want to save data to TensorBoard,...
I am using a custom docker image to run distributed training with PyTorch on SageMaker. The training script is taken from https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Segmentation/MaskRCNN. The DLC Image uses `pytorch-training:1.6.0-gpu-py3` as the base...
### Description of changes: - This PR is in the draft stage, I need to update tests, refactor and add comments. - Adds ability to save nested layers with the...
### Description of bug: - Users can wrap their modules with helper classes like `DataParallelCriterion` or `DataParallel` ``` custom_loss_module = CustomLossModule() parallel_custom_loss_module = DataParallelCriterion(custom_loss_module) ``` - The smdebug hook register...
### Description of changes: #### Style and formatting: I have run `pre-commit install` to ensure that auto-formatting happens with every commit. #### Issue number, if available By submitting this pull...
### Description of changes: #### Style and formatting: I have run `pre-commit install` to ensure that auto-formatting happens with every commit. #### Issue number, if available By submitting this pull...
### Description of changes: - If the customer over-writes the forward pass function of a Loss module then ZCC has no effect. Note: The first commit only reproduces the error...