text-generation-webui
text-generation-webui copied to clipboard
Anyone run it in Collab with llama?
https://colab.research.google.com/drive/1UJnzwI6uCScDFjmHNGQ8LZcmRT70mgwU?usp=sharing try this
It's awesome that even the 13B model can be run in Colab, however, the context window is pretty limited, I get OutOfMemoryError
at 314 words.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.75 GiB total capacity; 13.50 GiB already allocated; 18.81 MiB free; 13.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I'm not able to run this. It runs out of memory and dies when running out of memory loading the model. The collab Instance I'm using has 12.7 GB of RAM.
Is there a way to shrink ram usage by having a 4bit version? Or do I just use Collab Pro or import an instance from GCP?
Update: Tried the Pro version of Collab with 25.5 GB of RAM and ran into the same issue on startup.
Is this a quantized 4 bit version or the original?
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.