henan991201
Results
4
issues of
henan991201
I want to ask why we can not make tp and pp of a checkpoint bigger? For example, make tp=4 when its original tp is 2. I tried to do...
In training, I used swiglu, TP=4, PP=2. I use deepspeed_to_deepspeed.py to convert the checkpoint to a TP=1, PP=1 one. When evaluating the obtained checkpoint, it is found that the accuracy...
My training is interrupted, and I want to continue training. But it seems that only the adapter model weights are loaded, and the learning rate still starts from scratch. Thanks...
I noticed you evaluated the opt-175B model, how did it convert to a Megatron-Deepspeed checkpoint? I can not find a 175B huggingface transformers checkpoint. Also, I can not successfully convert...