Olatunji Ruwase
Olatunji Ruwase
Closing for lack of response. Please re-open if needed.
@lxd551326, it seems you seeing two different issues. 1. CUDA OOM using DeepSpeed for a model that works with pure pytorch is very strange and should be investigated. Can you...
@lhyscau, @DavidYanAnDe, and @lxd551326 are you able to provide repro steps?
@lqniunjunlper, are you able to share repro steps for this issue? Thanks
@Taiinguyenn139, thanks for helping to resolve this issue. Closing this issue.
@exnx, thanks for debugging this issue. Your analysis is correct. The purpose of that assertion is to confirm that existence of at least one `layer_*` file if using pipeline parallelism....
@Looong01, it seems your `localhost` is not configured for password-less ssh, which is a requirement for DeepSpeed. Please see https://www.deepspeed.ai/getting-started/#resource-configuration-multi-node Although you are using a single-node, `--autotuning` option operates as...
Closing for lack of activity. Please re-open if needed.
@gawain000000, can you clarify your goals because there are two different solutions for latency and throughput (and low budget) scenarios. I noticed the use of `deepspeed.init_inference` and zero stage 3...
@Xiang-cd, gradient accumulation in deepspeed works as follows 1. Assume each training [iteration](https://www.deepspeed.ai/getting-started/#training) consists of fwd, bwd, step. 2. Increment [micro-step counter](https://github.com/microsoft/DeepSpeed/blob/2a56f53395b2e0ef2ffe9947671fe153ba026328/deepspeed/runtime/engine.py#L2279) in step, and use configured gradient accumulation steps...