DeepSpeed
DeepSpeed copied to clipboard
fix regression in shard checkpoint loading in AutoTP Path caused by qkv_copy() is deleted and add UT case for shard checkpoint loading in AutoTP
- I add UT for the shard loading in AutoTP path, because I find the code could not be tested in CI and error like "qkv_copy() is replaced by strided_copy()" is not found when merge.
- the UT covers "bigscience/bloom-560m", "EleutherAI/gpt-j-6B", "EleutherAI/gpt-neo-125M", "facebook/opt-125m". and I also fix the problem found in gpt-neo-125m and opt-125m
@tjruwase @delock @yao-matrix
@tjruwase I added another commit for https://github.com/microsoft/DeepSpeed/commit/db26f8b41325be2a7f7af8b386b4e8951a5a76c9, the latest merged code suppose only KI path support shard loading, actually I have already added the support, see the usage in UT