Megatron-DeepSpeed Make sure deepspeed powered models are equivalent with their non deepspeed version

@DanielHesslow has opened a PR https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/212. This allows us to evaluate Megatron-Deepspeed models using the EAI harness directly in this repo, without needing to convert models into HF format.

The current issue, is that we train models using deepspeed (*ModelPipe) but the evaluation script loads a model without deepspeed (*Model). This creates issues where we might have discrepancies between those two. Ex: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/222

We need a test making sure that the output of both models are equal given an arbitrary configurations (regardless if we merge #212 or not)

cc @SaulLu @DanielHesslow @TevenLeScao

Jan 06 '22 11:01 thomasw21

Hi, Thanks for providing these amazing Bloom model! Just a quick question, whether someone has ensured the Deepspeed and Huggingface checkpoints lead to identical outputs?

Sep 02 '22 09:09 philippmtk

Hi @philippmtk unfortunately they don't lead to indentical outputs. Unfortunately, resharding checkpoints changes the order of operations and this is a problem with floating point arithmetic. fp operations are not associative. Please refer to this issue: https://github.com/pytorch/pytorch/issues/76232

Sep 02 '22 12:09 mayank31398