DeepSpeed AutoModelFromCausalLLM of Bloom not releasing GPU memory after each inference batch [BUG]

AutoModelFromCausalLLM of Bloom not releasing GPU memory after each inference batch [BUG]

Open chuckhope opened this issue 1 year ago • 0 comments

Describe the bug Hi there, I have set torch.no_grad() and torch.cuda.empty_cache(), but the GPU still encounters out-of-memory (OOM) errors after a few inferences. My torch version is 1.13.1, deepspeed version is 0.9, and transformer version is 4.28, cuda driver 11.6 with v100

Expected behavior auto release the memory ds_report output all good

May 11 '23 08:05 chuckhope

DeepSpeed DeepSpeed copied to clipboard

AutoModelFromCausalLLM of Bloom not releasing GPU memory after each inference batch [BUG]

DeepSpeed
DeepSpeed copied to clipboard