Junfeng Tian

Results 2 comments of Junfeng Tian

Instead of the universal checkpoint, I use the code from @tjruwase to convert a DeepSpeed checkpoint without TP and PP (128 ranks) to another DeepSpeed checkpoint (32 ranks). https://gist.github.com/rgtjf/aa90fc37efe38ad773046623780a1026 Discussions...