Make sure deepspeed powered models are equivalent with their non deepspeed version
@DanielHesslow has opened a PR https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/212. This allows us to evaluate Megatron-Deepspeed models using the EAI harness directly in this repo, without needing to convert models into HF format.
The current issue, is that we train models using deepspeed (*ModelPipe) but the evaluation script loads a model without deepspeed (*Model). This creates issues where we might have discrepancies between those two. Ex: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/222
We need a test making sure that the output of both models are equal given an arbitrary configurations (regardless if we merge #212 or not)
cc @SaulLu @DanielHesslow @TevenLeScao
Hi, Thanks for providing these amazing Bloom model! Just a quick question, whether someone has ensured the Deepspeed and Huggingface checkpoints lead to identical outputs?
Hi @philippmtk unfortunately they don't lead to indentical outputs. Unfortunately, resharding checkpoints changes the order of operations and this is a problem with floating point arithmetic. fp operations are not associative. Please refer to this issue: https://github.com/pytorch/pytorch/issues/76232