Jiarui Fang(方佳瑞)
Jiarui Fang(方佳瑞)
### What's the PR for Previously, Gemini is only checked on GPT2. After increasing the types of test models, the unitest test fails. You can merge this PR. But @1SAA...
bert grad_checkpoint = True时,runtime tracer和gemini的tracer检测的 采样点数对不上。
### Why Make sure the user can run OPT to profile performance in 1 minute. No data download, no complex training parameter setting. Just simply run a few iterations.
I convert this to draft, because GeminiDDP can not run inference. The states will be wrong.
## What's new This PR removes the dependency of LowLevelZeroOptimizer on gpc. gpc is a global variable. If use it, we can not use LowLevelZeroOptimizer together with ColoTensor TP. This...
### Describe the feature ## Current States Currently, GeminiDDP has to shard the optimizer state, and it has covered zero3 and zero3+offload. However, It doesn't actually cover zero1 and zero3....