flytekit
flytekit copied to clipboard
[Core feature] Add PyTorch Memory Profiling Deck Renderer
Tracking issue
Reference Issue
Why are the changes needed?
Huggingface announced a PyTorch memory visualizer that could be valuable for debugging memory-related issues in Flyte tasks, especially for failed executions. This implementation provides an interactive way to visualize PyTorch profiling data directly in Flyte decks, making it easier to diagnose memory issues.
What changes were proposed in this pull request?
- Added
PyTorchProfilingRendererclass for visualizing PyTorch profiling data - Implemented multiple visualization types:
- Memory usage visualization
- Execution timeline
- Memory segment analysis
- Profile visualization
- Comparison between snapshots
- Added comprehensive test suite
- Added memory analysis capabilities for failed executions
- Added support for loading profiling data from pickle files
How was this patch tested?
- Added unit tests for all visualization types
- Tested with real profiling data
- Verified visualizations in both success and failure scenarios
- Added test cases for error handling and edge cases
- All tests are passing in the test suite
Setup process
Screenshots
Check all the applicable boxes
- [X ] All new and existing tests passed.
- [ X] All commits are signed off.
Related PRs
Docs link
Summary by Bito
This PR introduces and enhances the PyTorch Memory Profiling Deck Renderer that enables visualization of memory usage in Flyte tasks. The implementation provides interactive visualizations including memory usage timeline, segment analysis, profile visualization, and snapshot comparisons. The enhancements include comprehensive error handling for pickle deserialization, improved validation of profiling data structures, and robust subprocess execution controls with better error messages and proper temporary file handling.Unit tests added: True
Estimated effort to review (1-5, lower is better): 4