Chien-Chin Huang
Chien-Chin Huang
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #92184 Current design of FSDP only support NamedOptimizer/KeyedOptimizer when use_orig_params is True this PR adds the support even if use_orig_params if False....
[FSDP][optim_state_dict][9/N] Rewrite the all-gather flow of optimizer state to support older GPUs
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #91343
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #92118 Make optim_state_dict and optim_state_dict_to_load public APIs and consolidate them with state_dict by using the same state_dict_type to decide how to perform...
Summary: Print out more useful error message for optim_state_dict Test Plan: CI Reviewed By: wz337 Differential Revision: D43556073
This PR uses shared memory to do async checkpoint on another process and also implements async staging (overlapping staging with the next iteration).
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #125339 * #125338 * #125337 * #125336 * __->__ #125335 * #125334 * #125333 Summary: Right now DCP only unflatten a container if...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #125339 * __->__ #125338 * #125337 * #125336 * #125335 * #125334 * #125333 Summary: This is useful if users would like to...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #125339 * #125338 * #125337 * __->__ #125336 * #125335 * #125334 * #125501 Summary: distributed_state_dict should not try to use `getattr` to...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #125339 * #125338 * #125337 * #125336 * #125335 * #125334 * #125333 Summary: This is useful if users would like to...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #125339 * #125338 * #125337 * #125336 * #125335 * __->__ #125334 * #125333 Summary: If an object only exists on certain non-coordinator...