DeepSpeed
DeepSpeed copied to clipboard
[BUG] DS-inference possible memory duplication
DS-inference runs out of memory more quickly for GPT2 than BLOOM even if they have similar number of parameters.
tables for both models in the README
@jeffra
Confirmed w. @mayank31398 on slack that DeepSpeed v0.7.5 was used here
@mayank31398 When does the OOM error pop up? Is it during the call to deepspeed.init_inference
? If so, I ran into a similar problem with DeepSpeed-MII recently where models would fit on GPU with baseline HF model, but OOM when injecting kernels on the DeepSpeed side. The problem is that DS-Inference requires some temporary additional GPU memory when doing the kernel injection. To get around this, we keep the baseline model on system memory and allow deepspeed.init_inference
to move the model to GPU.
Checkout this PR and the linked issues: https://github.com/microsoft/DeepSpeed-MII/pull/105
Init inference is fine, its in forward @mrwyattii
Hi @mayank31398,
I want to look into this. Can you please point me to the right script that I can run on my side? Thanks, Reza
@RezaYazdaniAminabadi @mrwyattii @jeffra https://github.com/bigcode-project/bigcode-inference-benchmark You can run
sh scripts/run_batch_size.sh ds-inference-1b-bloom-fp16
This will run BLOOM 1.3B (randomly initialized) using DS-inference in fp16 in batch sizes 1 to 128 (doubled every step) and after that increased in steps of 128.
For other models, you can look at the Makefile
Thanks @mayank31398 :)
@mayank31398, Reza mentioned he talked to you about this. Please re-open if latest deepspeed does not resolve this.
@RezaYazdaniAminabadi Hi, I also met similar problem to this issue. I mentioned the details in https://github.com/microsoft/DeepSpeed/issues/3182 . Could you take some time over it?