llama
llama copied to clipboard
Failure on A100 32GB
Hi, I've been trying to run the example inference using the 7B model weights, but I get:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 39.59 GiB total capacity; 27.26 GiB already allocated; 24.19 MiB free; 27.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Is there anything I can do about this? E.g. changing the numeric type? How?
Also: can I use more than one GPU?