llama How to run 13B model on 4*16G V100？

How to run 13B model on 4*16G V100？

Open qwer10 opened this issue 1 year ago • 2 comments

RuntimeError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 15.78 GiB total capacity; 14.26 GiB already allocated; 121.19 MiB free; 14.69 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 143) of binary: /opt/conda/envs/torch1.12/bin/python

Mar 02 '23 07:03 qwer10

llama llama copied to clipboard

How to run 13B model on 4*16G V100？

llama
llama copied to clipboard