DeepSpeed uniform deepspeed overflow check

uniform deepspeed overflow check

Open GuanhuaWang opened this issue 10 months ago • 1 comments

Before: Overflow check is scattered and duplicated in all places.

This PR:

Single interface as CheckOverflow class, which abstract and uniform overflow check among ZeRO, ZeRO-Offload, Pipeline Parallelism, BF16_optimizer.
Skip step() operation if detect gradients overflow in BF6_optimizer. (avoid polluting checkpoint, etc)

cc @tjruwase

Apr 16 '24 22:04 GuanhuaWang

Why not using tensor.isnan() and tensor.isinf()?

Apr 20 '24 07:04 Anhelor