Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
questions about inconsistent evaluation result
Hi,i have used deepspeed framework to train gpt-117M model. when i evaluate model perfomance on wikitext-103, result by using tasks/eval_harness/evaluate.py vs. first convert checkpoint to megatron format and use tasks/main.py , there exists a large performance gap in PPL... May I ask what is the reason for this phenomenon? @mayank31398