DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

AutoModelFromCausalLLM of Bloom not releasing GPU memory after each inference batch [BUG]

Open chuckhope opened this issue 1 year ago • 0 comments

Describe the bug Hi there, I have set torch.no_grad() and torch.cuda.empty_cache(), but the GPU still encounters out-of-memory (OOM) errors after a few inferences. My torch version is 1.13.1, deepspeed version is 0.9, and transformer version is 4.28, cuda driver 11.6 with v100

Expected behavior auto release the memory ds_report output all good

chuckhope avatar May 11 '23 08:05 chuckhope