Olatunji Ruwase

Results 634 comments of Olatunji Ruwase

@lxd551326, it seems you seeing two different issues. 1. CUDA OOM using DeepSpeed for a model that works with pure pytorch is very strange and should be investigated. Can you...

@lhyscau, @DavidYanAnDe, and @lxd551326 are you able to provide repro steps?

@lqniunjunlper, are you able to share repro steps for this issue? Thanks

@Taiinguyenn139, thanks for helping to resolve this issue. Closing this issue.

@exnx, thanks for debugging this issue. Your analysis is correct. The purpose of that assertion is to confirm that existence of at least one `layer_*` file if using pipeline parallelism....

@Looong01, it seems your `localhost` is not configured for password-less ssh, which is a requirement for DeepSpeed. Please see https://www.deepspeed.ai/getting-started/#resource-configuration-multi-node Although you are using a single-node, `--autotuning` option operates as...

@gawain000000, can you clarify your goals because there are two different solutions for latency and throughput (and low budget) scenarios. I noticed the use of `deepspeed.init_inference` and zero stage 3...

@Xiang-cd, gradient accumulation in deepspeed works as follows 1. Assume each training [iteration](https://www.deepspeed.ai/getting-started/#training) consists of fwd, bwd, step. 2. Increment [micro-step counter](https://github.com/microsoft/DeepSpeed/blob/2a56f53395b2e0ef2ffe9947671fe153ba026328/deepspeed/runtime/engine.py#L2279) in step, and use configured gradient accumulation steps...