Lucas Pasqualin
Lucas Pasqualin
#### Context **We've pivoted on this PR from an RFC on torchtune + dcp UX, to a PR which showcases async saving benchmarks.** If consensus is reached that we want...
~Users may have custom use cases for the `strict` parameter in load. In my mind, if we automatically call `state_dict` and `load_state_dict` in save/load, we need to support the same...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #124944 * #124939 * #122965 This logic is specific to FilesystemWriter, and now has a better place to live due to the...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #124944 * __->__ #124939 * #122965 Differential Revision: [D56575987](https://our.internmc.facebook.com/intern/diff/D56575987/) cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #124944 * #124939 * __->__ #122965 Differential Revision: [D55493240](https://our.internmc.facebook.com/intern/diff/D55493240/) *This PR is now ready for merge and is not an RFC* Major choices...
This PR seeks to increase observability of save/load requests. This is accomplished with two main changes: 1. The creation of save_id and load_id: - a save_id and load_id is added...
## Description Please read our [CONTRIBUTING.md](https://github.com/pytorch/PiPPy/blob/main/CONTRIBUTING.md) prior to creating your first pull request. Please include a summary of the feature or issue being fixed. Please also include relevant motivation and...
Summary: Distributed State Dict is the current suggested way from PyTorch for ensuring parallelized models state dicts are compatible with save/loads in Single process or re-sharding scenarios. This diff updates...