DeepSpeed
DeepSpeed copied to clipboard
Fix `PipelineEngine.eval_batch` result
With F16 enabled, PipelineEngine.eval_batch
will not correctly broadcast loss. In last stage, eval_batch
returns f16 loss, while in other stages, eval_batch
will return noise.
def _bcast_pipe_scalar(self, data, src_rank=None, dtype=torch.float32):
# Default to last stage (e.g., for broadcasting loss)
if src_rank is None:
src_rank = self.grid.stage_to_global(self.num_stages - 1)
assert src_rank in self.grid.pp_group
if self.global_rank == src_rank:
result = data.clone().detach() # f16 tensor
else:
result = torch.Tensor([0.]).type(dtype).to(self.device) # f32 tensor
# trying to broadcast a f16 tensor to f32 tensors here, and the result is noise.
dist.broadcast(tensor=result, src=src_rank, group=self.mpu.get_pipe_parallel_group())
return result
Environments:
- torch 1.13.1
- cuda 11.7
- GPU A100 40GB + driver 450.80.02
We have been working on LM recently, and encountered this problem. I am trying to fix it. @ShadenSmith @duli2012