lollms-webui
lollms-webui copied to clipboard
How do I set a VRAM split with AutoGPTQ for multiple GPUs
I'm running a pair of RTX 3060 cards but also an RTX 3060 and an RTX 3080 in my desktop and need to set 6,12 but the current max_gpu_mem_GB is only designed for a single card.
On my desktop trying to load a 13b model I see the following because it chooses to ignore the other GPU that has 12GB free.
Couldn't build model: [CUDA out of memory. Tried to allocate 314.00 MiB (GPU 0; 9.77 GiB total capacity; 6.84 GiB already allocated; 61.44 MiB free; 7.05 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF]
Oh and not related to my question
When the icon for Twitter was updated. The Github and Youtube icons both have the same description "Visit repository page"