Olatunji Ruwase

Results 562 comments of Olatunji Ruwase

@wuxb45, @Heathcliff-Zhao, and @cloudwaysX thanks for reporting and triaging this issue.

[like] Olatunji Ruwase reacted to your message: ________________________________ From: minchao ***@***.***> Sent: Friday, April 26, 2024 9:44:56 AM To: microsoft/DeepSpeed ***@***.***> Cc: Olatunji Ruwase ***@***.***>; Mention ***@***.***> Subject: Re: [microsoft/DeepSpeed]...

@stas00, I chatted extensively with @tohtana, and perhaps I can provide some clarification here. I think `not officially supported`' means `not anticipated` and `not tested`. We are both unsure of...

Fixed by #2989. Will open new issue for #3202 as needed.

@hahchenchen and @DavdGao, unfortunately we don't have a tutorial for this. However, there are two options available for this issue: 1. You can directly pass the torch [implementation](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.CosineAnnealingLR.html) into `deepspeed.initialize()`...

@HsuWanTing, you can also pass lr scheduler as a Callable, which should work for your case. Please see the following example https://github.com/microsoft/DeepSpeed/blob/3dd7ccff8103be60c31d963dd2278d43abb68fd1/tests/unit/runtime/test_ds_initialize.py#L254

@zhujc000, gradient checkpointing solves activation memory consumption which is different from model/optimizer memory consumption that zero 3 solves. I think you found the right solution for the problem.

@clumsy, thanks for recreating the PR. It is greatly appreciated contribution.

> Please let me know if more work is required for this change to get merge, @tjruwase . Sorry, this dropped from my mind. No more work required. I have...

> Hi @tjruwase, looks like seemingly unrelated `TestHybridEngineTextGen` test keeps failing. Is this the reason why this change cannot be merged? This test failure is preventing the the auto-merge. But...