Baichuan-7B
Baichuan-7B copied to clipboard
[Question] DeepSpeed Zero3 save_checkpoint() got empty mode_states files
Required prerequisites
- [X] I have read the documentation https://github.com/baichuan-inc/baichuan-7B/blob/HEAD/README.md.
- [X] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- [X] Consider asking first in a Discussion.
Questions
Hi, I used the code to continue pretrain the model and used zero3 for model training. But I found my checkpoint file zero_pp_rank_*_mp_rank_00_model_states.pt is empty, the file only has model parameters name and shape, don't have the weights. Have you ever met this problem and how to fix?
Thanks!
Checklist
- [X] I have provided all relevant and necessary information above.
- [X] I have chosen a suitable title for this issue.
I have met the same problem and my solution is to use deepspeed zero2 instead of zero3
My solution is to save checkpoints by myself or you can use zero_to_fp32
My solution is to save checkpoints by myself or you can use zero_to_fp32
@mynewstart I found my converted ckpt global_step_xxx
only contains meaningful *optim_states.pt
but only empty *model_states.pt
. Any clues on this? Thanks.