transformers-bloom-inference icon indicating copy to clipboard operation
transformers-bloom-inference copied to clipboard

Big batchsize cause OOM in bloom-ds-inference.py, how to adjust max_split_size_mb value

Open tohneecao opened this issue 2 years ago • 1 comments

OutOfMemoryError: CUDA out of memory. Tried to allocate 62.00 MiB (GPU 6; 79.19 GiB total capacity; 66.51 GiB already allocated; 61.56 MiB free; 67.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFtorch.cuda .OutOfMemoryError : return forward_call(*input, **kwargs)CUDA out of memory. Tried to allocate 62.00 MiB (GPU 4; 79.19 GiB total capacity; 66.51 GiB already allocated; 61.56 MiB free; 67.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

tohneecao avatar Apr 27 '23 06:04 tohneecao

max_split_size_mb won't work with deepspeed inference I think. This is only for pure pytorch native code.

mayank31398 avatar May 10 '23 01:05 mayank31398