Vikas-kum

Results 17 issues of Vikas-kum

Hi, I have a stream and the current code is repeating the processing of data in streams when restarted. I wanted to make sure that there is some checkpoint written...

Issue: https://github.com/awslabs/sagemaker-debugger/issues/321 ### Description of changes: Adding filtering logic to dest_names and make sure that we always ask for subgraph which is present in graph def #### Style and formatting:...

This log line confuses customers. Let's remove it. [2020-06-20 21:49:37.529 algo-1:67 INFO utils.py:25] The end of training job file will not be written for jobs running under SageMaker.

Follow up from this : https://github.com/awslabs/sagemaker-debugger/pull/225

Save raw tensor api needs to be implemented. Use case is - user wants to save a tensor which is not part of model graph. The implementation would look like...

enhancement

Q. In the example I created, I needed to save data that is not part of the model training. I did this by calling directly hook._write_raw_tensor_simple(), which worked fine. But...

documentation
question
wontfix
FAQ

Q. when creating a custom collection, is there a way to define EVAL/TRAIN save_interval directly in the SageMaker Estimator? ANS: Yes, it can be provided, for details see this section...

documentation
wontfix
FAQ

In CI : https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=DO-NOT-DELETE-smdebug_rules-LOGS-ONE-REPO;stream=codebuild/c3bda538-9277-42db-931a-de5984013923;filter=%22Loaded%20Index%20Files:%20upload/20200106_221841/c33ae10/s3_trials/trial_loss_not_decreasing_tf_true_parallel_mode_1578351365.7939517/index/000000000/000000000070_worker_0.json%22 Why is this line repeated so many times: "Loaded Index Files: upload/20200106_221841/c33ae10/s3_trials/trial_loss_not_decreasing_tf_true_parallel_mode_1578351365.7939517/index/000000000/000000000070_worker_0.json" Are we reloading index files again and again ? @NihalHarish Please check and confirm

Come up with a way so that CI prints the running time for each tests. Find what integration tests are running longer and optimize them to make them run fast....