llama icon indicating copy to clipboard operation
llama copied to clipboard

Failure on A100 32GB

Open vincenzoml opened this issue 1 year ago • 11 comments

Hi, I've been trying to run the example inference using the 7B model weights, but I get:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 39.59 GiB total capacity; 27.26 GiB already allocated; 24.19 MiB free; 27.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Is there anything I can do about this? E.g. changing the numeric type? How?

Also: can I use more than one GPU?

vincenzoml avatar Mar 02 '23 10:03 vincenzoml