torchacc
torchacc copied to clipboard
support save&load of fsdp_optim_state
What this pr do:
- suport flatten(including padding before shard) and unflatten full_optim_state_dic save and load and test with ut.
- support save and load of shard_optim_state_dict.
TODO:
- test the memory usage of checkpointing 70b model.
- shard_param_on_dim_0(?)