DeepSpeed
DeepSpeed copied to clipboard
AutoModelFromCausalLLM of Bloom not releasing GPU memory after each inference batch [BUG]
Describe the bug Hi there, I have set torch.no_grad() and torch.cuda.empty_cache(), but the GPU still encounters out-of-memory (OOM) errors after a few inferences. My torch version is 1.13.1, deepspeed version is 0.9, and transformer version is 4.28, cuda driver 11.6 with v100
Expected behavior auto release the memory ds_report output all good