Megatron-DeepSpeed questions about inconsistent evaluation result

questions about inconsistent evaluation result

Open coorful opened this issue 1 year ago • 0 comments

Hi，i have used deepspeed framework to train gpt-117M model. when i evaluate model perfomance on wikitext-103, result by using tasks/eval_harness/evaluate.py vs. first convert checkpoint to megatron format and use tasks/main.py , there exists a large performance gap in PPL... May I ask what is the reason for this phenomenon? @mayank31398

Jul 24 '23 03:07 coorful

Megatron-DeepSpeed Megatron-DeepSpeed copied to clipboard

questions about inconsistent evaluation result

Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard