Olatunji Ruwase comments

Results 613 comments of


                                            Olatunji Ruwase

trafficstars

checking process_group before merging bucket ranges (#3521)

> Still no luck, @tjruwase . Is there a known issue with CUDA OOM in DeepSpeed tests? I only added a few layers to SimpleMoEModel that is not that widely...

[REQUEST] DeepSpeed Zero3 swap off gradients unnecessarily when swap_optimizer is True

The `offload_fp32_gradients` that are swapped out are the [fp32 versions](https://github.com/microsoft/DeepSpeed/blob/c88af2143248e4655d401f9231317f3c76018057/deepspeed/runtime/zero/stage3.py#L1247) of the gradients needed for `optimizer.step()` computation. On the other hand, `self.__param_id_to_grad_partition` holds 16-bit gradients.

[REQUEST] DeepSpeed Zero3 swap off gradients unnecessarily when swap_optimizer is True

The swap mode is enabled when optimizer state is offloaded to nvme because both GPU and CPU memory are too small. In that case, there is little benefit to keeping...

[REQUEST] DeepSpeed Zero3 swap off gradients unnecessarily when swap_optimizer is True

@platoonpluto, thanks for the clarification. Yes, your observation is correct, we could avoid since the swap overhead since the 16-bits gradients are always available in `self.__param_id_to_grad_partition`. One reason for my...

fix uneven issue & add balance autotp

@Yejing-Lai, please help resolve conflict.

fix uneven issue & add balance autotp

> Hi @tjruwase is this PR under review state or merge state? We are working on Intel Extension for PyTorch release and want to know whether this PR will be...

Comparison of Deepspeed Stage 1,2 and 3 vs DDP

@jpatel-bdai, all zero stages are expected to match ddp on single gpu runs. So, it appears that you are hitting bugs in zero. Are you able to share detailed steps...

Comparison of Deepspeed Stage 1,2 and 3 vs DDP

Ideally, we expect zero stages to match ddp in multi-gpu runs, since zero is designed to be a memory-efficient ddp algorithm. In terms of debugging, a first step would be...

[BUG] Fail to Resume From Checkpoint with Different GPU Number(Huggingface Trainer + Deepspeed)

@Orion-Zheng, are you still having this issue?

Why not save frozen params unless: `self.zero_optimization_stage() >= ZeroStageEnum.gradients`?

> Some more background: I'm working on the RWKV project, a fork, where they save the weights with a copy of `zero_to_fp32.py`. @freckletonj, apologies for the delayed response here. Is...