ColossalAI
ColossalAI copied to clipboard
[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer.
📌 Checklist before creating the PR
- [ ] I have created an issue for this PR for traceability
- [x] The title follows the standard format:
[doc/gemini/tensor/...]: A concise description - [ ] I have added relevant tags if possible for us to better distinguish different PRs
🚨 Issue number
N/A
📝 What does this PR do?
It seems that the variable names related to the mixed precision parameter group do not comprehensively cover its characteristics, so I suggest a few changes. These changes are very trivial, but hopefully they will alleviate some of the confusion for beginners like me.
Currently, the entire parameter group is named fp16_param_groups, and the parts managed by the gpu at the current rank are described as fp32_flat_param_groups_of_current_rank. This state perfectly represents the characteristics when the master weight is a half-tensor or the dtype specified in the __init__ method is fp16. In other cases, however, its characteristics do not correspond to the variable it. So I want it to be renamed according to the sharding state, not the data type, according to the fsdp convention of pytorch. (with names like flatten_sharded_optim_state_dict and full_optim_state_dict).
This is a related but more trivial issue, but it seems that the param_store methods don't even need to specify fp16.
Thank you :)
💥 Checklist before requesting a review
- [ ] I have linked my PR to an issue (instruction)
- [ ] My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
- [x] I have performed a self-review of my code
- [ ] I have added thorough tests.
- [x] I have added docstrings for all the functions/methods I implemented
⭐️ Do you enjoy contributing to Colossal-AI?
- [x] 🌝 Yes, I do.
- [ ] 🌚 No, I don't.
Tell us more if you don't enjoy contributing to Colossal-AI.