Profiler tf native training
Description of changes:
This commit is to enable profiler in the tf2 native training (design doc: https://quip-amazon.com/v0MwAkTizZl9/Profiler-for-TensorFlow2-native-training). The corresponding integration tests for tf 2.2 and 2.3 passed successfully. TF2.2 integration test: https://console.aws.amazon.com/codesuite/codebuild/072677473360/projects/smprofiler_tf2_integration_tests/build/smprofiler_tf2_integration_tests%3A2bff3f63-b797-4c5e-9992-0fdf17f13bec?region=us-east-1 TF2.3 integration test: https://console.aws.amazon.com/codesuite/codebuild/072677473360/projects/smprofiler_tf_2_3_integration_tests/build/smprofiler_tf_2_3_integration_tests%3A801a9a02-30b5-4af6-9835-9b01a6ed6ce4/?region=us-east-1
The changes include:
- Added profiling_start_batch(), profiling_end_batch() and profiling_end() functions inside keras.py to enable the profiler functionalities in the native train loop.
- Added python_profiler as KerasHook's attribute to have a better practice and be better for testing the python profiling.
- Added is_profiler_native_training (default to False) as KerasHook's attribute to indicate enabling profiler in the tensorflow2 native training. It is used to handle the different use cases (only debugger enabled, only profiler enabled, both debugger and profiler enabled).
- Added _decrement_step() function to decrease the step number when both profiler and debugger are enabled. In this case, step number is first increased by 1 inside profiling_start_batch() and decreased by 1 inside wrap_tape() before calling the _wrap_tape_push() function, in order to keep the debugger code unchanged inside _wrap_tape_push() function.
- Added _handle_start_python_profiling(), _handle_end_python_profiling(), _handle_start_detailed_profiling(), _handle_end_detailed_profiling(), _handle_start_dataloader_profiling(), _handle_end_dataloader_profiling() methods inside keras.py to reduce the code.
- Updated _increment_step() function in the hook.py to be able to separate the functionalities of step increase and write state.
- Added unit tests for profiler only and profiler + debugger use cases.
Style and formatting:
I have run pre-commit install to ensure that auto-formatting happens with every commit.
Issue number, if available
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Codecov Report
Merging #420 (f704e48) into master (6788e32) will decrease coverage by
14.18%. The diff coverage is5.68%.
@@ Coverage Diff @@
## master #420 +/- ##
===========================================
- Coverage 76.91% 62.72% -14.19%
===========================================
Files 113 113
Lines 10195 10237 +42
===========================================
- Hits 7841 6421 -1420
- Misses 2354 3816 +1462
| Impacted Files | Coverage Δ | |
|---|---|---|
| smdebug/tensorflow/keras.py | 0.00% <0.00%> (-90.10%) |
:arrow_down: |
| smdebug/core/hook.py | 89.33% <100.00%> (-4.56%) |
:arrow_down: |
| smdebug/tensorflow/__init__.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
| smdebug/tensorflow/constants.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
| smdebug/tensorflow/singleton_utils.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
| smdebug/tensorflow/collection.py | 0.00% <0.00%> (-95.88%) |
:arrow_down: |
| smdebug/tensorflow/session.py | 0.00% <0.00%> (-91.83%) |
:arrow_down: |
| smdebug/tensorflow/tensor_ref.py | 0.00% <0.00%> (-88.71%) |
:arrow_down: |
| smdebug/tensorflow/utils.py | 0.00% <0.00%> (-87.62%) |
:arrow_down: |
| ... and 30 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 6788e32...f704e48. Read the comment docs.