Olatunji Ruwase comments

Results 562 comments of


                                            Olatunji Ruwase

[BUG] Tensors are on different devices when model.step()

@wuxb45, @Heathcliff-Zhao, and @cloudwaysX thanks for reporting and triaging this issue.

Fix crash when creating Torch tensor on NPU with device=get_accelerator().current_device()

[like] Olatunji Ruwase reacted to your message: ________________________________ From: minchao ***@***.***> Sent: Friday, April 26, 2024 9:44:56 AM To: microsoft/DeepSpeed ***@***.***> Cc: Olatunji Ruwase ***@***.***>; Mention ***@***.***> Subject: Re: [microsoft/DeepSpeed]...

[BUG] `zero.Init` fails when some class is brought in dynamically

@stas00, I chatted extensively with @tohtana, and perhaps I can provide some clarification here. I think `not officially supported`' means `not anticipated` and `not tested`. We are both unsure of...

[BUG] `zero.Init` fails when some class is brought in dynamically

Fixed by #2989. Will open new issue for #3202 as needed.

Deepspeed support CosineAnnealingLR scheduler

@hahchenchen and @DavdGao, unfortunately we don't have a tutorial for this. However, there are two options available for this issue: 1. You can directly pass the torch [implementation](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.CosineAnnealingLR.html) into `deepspeed.initialize()`...

Deepspeed support CosineAnnealingLR scheduler

@HsuWanTing, you can also pass lr scheduler as a Callable, which should work for your case. Please see the following example https://github.com/microsoft/DeepSpeed/blob/3dd7ccff8103be60c31d963dd2278d43abb68fd1/tests/unit/runtime/test_ds_initialize.py#L254

Olatunji Ruwase

[BUG] Tensors are on different devices when model.step()

Fix crash when creating Torch tensor on NPU with device=get_accelerator().current_device()

[BUG] `zero.Init` fails when some class is brought in dynamically

[BUG] `zero.Init` fails when some class is brought in dynamically

Deepspeed support CosineAnnealingLR scheduler

Deepspeed support CosineAnnealingLR scheduler

[BUG]try to finetune chatglm-6b with zero 3,but CUDA OOM

checking process_group before merging bucket ranges (#3521)

checking process_group before merging bucket ranges (#3521)

checking process_group before merging bucket ranges (#3521)