DeepSpeed
DeepSpeed copied to clipboard
[REQUEST] BF16 mixed precision => grad accum in fp32
Is your feature request related to a problem? Please describe.
We have proven with the BLOOM training that BF16 is by far more superior for mixed precision training than FP16, using Megatron-Deepspeed
But the latter is very complex, it'd be much easier for folks to use standalone ZeRO for training in bf16 mixed precision.
But for this to work we need ZeRO to support grad accumulation in fp32, similar to the recently added BF16Optimizer
.
So this is the feature request to backport BF16Optimizer
's fp32 grad accumulation to ZeRO-1,2,3.
Once this is done I can safely tell those who are interested in an easier to solution to use ZeRO.
@tjruwase, @jeffra
@tjruwase, would it be possible to implement this? We are ready to start using ZeRO-3/bf16 for the multi-modal training.
Thank you very much!
@stas00, is it better to close this or #2768? They are the same thing, right?
Hi Tunji - you're the owner so it's up to you to decide. The new one is a duplicate of this one, so typically the earliest one stays.
And I don't agree with the other request that it should be hardcoded to fp32, it should be a user choice. Though most likely a sensible default should be fp32 for bf16 mixed precision training.
Makes sense, will close the newer one and reference this appropriately.
Yes, the accumulation type will be configurable. Hopefully, we should have a WIP pushed later this week. It would be great to get your usual feedback as we iterate on a solution.
Fantastic news, Tunji. Thank you.
And, yes, we would be happy to experiment with your WIP PR.
Amazing, thanks!
hi I also met the same problem. @tjruwase have you found a solution?
Great, looking forward to see this new release!
Any update on this @tjruwase?
Please see: https://github.com/microsoft/DeepSpeed/pull/2847
Closing as completed by #2847.