DeepSpeed
DeepSpeed copied to clipboard
FP16 fused and unfused grad norm query.
This still needs FP32 and ZeRO. And unit tests :-).
Can one of the admins verify this patch?
This ended up being added in a later PR