DeepSpeed
DeepSpeed copied to clipboard
[BUG] Does bf16 support Zero stage 1 with pipeline?
Describe the bug I'm using Deepspeed-Megatron although, using pipeline parallelism and setting
"bf16": {
"enabled": "auto"
}
will step into the NotImplementedError
in
#/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py
def _exec_reduce_grads(self):
self._force_grad_boundary = True
if self.pipeline_enable_backward_allreduce:
if self.bfloat16_enabled():
if self.zero_optimization_stage() == 0:
self._bf16_reduce_grads()
else:
assert self.zero_optimization_stage() == 1, "only bf16 + z1 are supported"
raise NotImplementedError()
else:
self.allreduce_gradients(bucket_size=MEMORY_OPT_ALLREDUCE_SIZE)
self._force_grad_boundary = False
But when using transformer integrated deepspeed with Zero Stage 1/2/3. It work fine The only diff I found is that, in megatron-deepspeed the model was a subclass of PipelineModule whereas in transformer is not
I wonder whether deepspeed now support pipeline bf16 with zero stage 1 or just my code mistakes
@lyj201002, thanks for reporting this bug. We are working on a fix.