Megatron-DeepSpeed Sync 4 layer norms - bf16, fp32, optimizer states on restart

Sync 4 layer norms - bf16, fp32, optimizer states on restart

Open tjruwase opened this issue 3 years ago • 0 comments

this PR uses https://github.com/microsoft/DeepSpeed/pull/1801 @ d911e67 to sync layer norms:

for bf16 weights
for fp32 weights in bf16 optimizer
for 2 optimizer states

all_reduce/OP.AVG is used in all 3 cases.

automatically works for all layers and all 4 types of layer norms - weights and biases.

this has been successfully applied to the live model and lead to layers getting in sync - but after some iterations some layers got out of sync again - some other bug to figure out.

Mar 28 '22 19:03 tjruwase

Megatron-DeepSpeed Megatron-DeepSpeed copied to clipboard

Sync 4 layer norms - bf16, fp32, optimizer states on restart

Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard