DeepSpeed [BUG] DS-inference possible memory duplication

[BUG] DS-inference possible memory duplication

Open mayank31398 opened this issue 2 years ago • 6 comments

DS-inference runs out of memory more quickly for GPT2 than BLOOM even if they have similar number of parameters.

tables for both models in the README

@jeffra

Dec 06 '22 23:12 mayank31398

Confirmed w. @mayank31398 on slack that DeepSpeed v0.7.5 was used here

Dec 06 '22 23:12 jeffra

@mayank31398 When does the OOM error pop up? Is it during the call to deepspeed.init_inference? If so, I ran into a similar problem with DeepSpeed-MII recently where models would fit on GPU with baseline HF model, but OOM when injecting kernels on the DeepSpeed side. The problem is that DS-Inference requires some temporary additional GPU memory when doing the kernel injection. To get around this, we keep the baseline model on system memory and allow deepspeed.init_inference to move the model to GPU.

Checkout this PR and the linked issues: https://github.com/microsoft/DeepSpeed-MII/pull/105

Dec 07 '22 00:12 mrwyattii

Init inference is fine, its in forward @mrwyattii

Dec 07 '22 07:12 mayank31398

Hi @mayank31398,

I want to look into this. Can you please point me to the right script that I can run on my side? Thanks, Reza

Dec 09 '22 18:12 RezaYazdaniAminabadi

@RezaYazdaniAminabadi @mrwyattii @jeffra https://github.com/bigcode-project/bigcode-inference-benchmark You can run

sh scripts/run_batch_size.sh ds-inference-1b-bloom-fp16

This will run BLOOM 1.3B (randomly initialized) using DS-inference in fp16 in batch sizes 1 to 128 (doubled every step) and after that increased in steps of 128.

For other models, you can look at the Makefile

Dec 13 '22 00:12 mayank31398

Thanks @mayank31398 :)

Dec 14 '22 19:12 RezaYazdaniAminabadi

@mayank31398, Reza mentioned he talked to you about this. Please re-open if latest deepspeed does not resolve this.

Mar 17 '23 18:03 jeffra

@RezaYazdaniAminabadi Hi, I also met similar problem to this issue. I mentioned the details in https://github.com/microsoft/DeepSpeed/issues/3182 . Could you take some time over it?

Apr 11 '23 06:04 frankxyy

DeepSpeed DeepSpeed copied to clipboard

[BUG] DS-inference possible memory duplication

DeepSpeed
DeepSpeed copied to clipboard