henan991201

Results 4 issues of henan991201

I want to ask why we can not make tp and pp of a checkpoint bigger? For example, make tp=4 when its original tp is 2. I tried to do...

In training, I used swiglu, TP=4, PP=2. I use deepspeed_to_deepspeed.py to convert the checkpoint to a TP=1, PP=1 one. When evaluating the obtained checkpoint, it is found that the accuracy...

My training is interrupted, and I want to continue training. But it seems that only the adapter model weights are loaded, and the learning rate still starts from scratch. Thanks...

I noticed you evaluated the opt-175B model, how did it convert to a Megatron-Deepspeed checkpoint? I can not find a 175B huggingface transformers checkpoint. Also, I can not successfully convert...