sagemaker-debugger
sagemaker-debugger copied to clipboard
Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors
### Description of changes: (Unable to reopen #463 so I'm creating a new PR). For each step, we need to determine if the profiler config JSON has changed, and if...
### Description of changes: #### Style and formatting: I have run `pre-commit install` to ensure that auto-formatting happens with every commit. #### Issue number, if available By submitting this pull...
### Description of changes: - Reverts two commits that had disabled Pytorch 1.7 ZCC tests #### Style and formatting: I have run `pre-commit install` to ensure that auto-formatting happens with...
### Description of changes: - Remove debugger checkpointing capability to build a test binary. #### Style and formatting: I have run `pre-commit install` to ensure that auto-formatting happens with every...
### Description of changes: - if the `metadata.json` file is empty there is potential that the hook will crash. #### Style and formatting: I have run `pre-commit install` to ensure...
I have a question on basic understanding for which i do not find answer in the doc. If i understand well, the documentation says that we do not have to...
I am passing in batches into tf.GradientTape() loop and calling my model for a prediction. When I call tape.gradient(prediction, batch), smdebug throws "ValueError: The truth value of an array with...
### Description of changes: This commit is to enable profiler in the tf2 native training (design doc: https://quip-amazon.com/v0MwAkTizZl9/Profiler-for-TensorFlow2-native-training). The corresponding integration tests for tf 2.2 and 2.3 passed successfully. TF2.2...
I'm bringing my own PyTorch training script, and I'm interested in using SM Debugger to profile function calls in my training jobs. The [API Glossary](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md#glossary) states: > Step: Step means...