sagemaker-debugger icon indicating copy to clipboard operation
sagemaker-debugger copied to clipboard

Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors

Results 92 sagemaker-debugger issues
Sort by recently updated
recently updated
newest added

``` if self._prepared_tensors[mode]: if self._exported_collections is False: # in keras, these collections change when mode changes # but rest of the project isn't yet capable of handling this # this...

Hi, Tom from [Codecov](https://codecov.io) here. We noticed that you are using Codecov with fairly high frequency, and we’re so excited to see that! However, because you are not using our...

Follow up from this : https://github.com/awslabs/sagemaker-debugger/pull/225

If you see the error messages `E ModuleNotFoundError: No module named 'smdebug.core.tfevent.proto.types_pb2'1` or `ERROR: Compiling summary protocol buffers failed. You will not be able to use smdebug. Please make sure...

documentation
FAQ

### Description of changes: - Updated docs : `docs/distributed_training.md` - Currently still WIP: Need to add docs and examples for XGBoost - More examples to cover missing cases. - TF...

These warnings are seen when running a TF 2.x mirrored strategy job, although the training seems to succeed. Root cause the source of these warnings. ``` 2020-04-21 20:41:50.531730: W tensorflow/core/kernels/data/cache_dataset_ops.cc:822]...

loss functional values are saved twice for each step with AWS Pytorch. This happens because functional losses are saved by default by the post_hook_for_loss_functional fn in AWS Pytorch. ``` ......

Events produced are empty when using include_collections=["all"]. Works with save_all=True. Hook code: save_config = smd.SaveConfig(save_interval=1) reduction_config = smd.ReductionConfig(["max", "min"]) hook = smd.Hook(out_dir='...', reduction_config=reduction_config, save_all=True, #include_collections=["all"], export_tensorboard=True, save_config=save_config, tensorboard_dir='...', include_workers="all")

Save raw tensor api needs to be implemented. Use case is - user wants to save a tensor which is not part of model graph. The implementation would look like...

enhancement

Q. In the example I created, I needed to save data that is not part of the model training. I did this by calling directly hook._write_raw_tensor_simple(), which worked fine. But...

documentation
question
wontfix
FAQ