Multi-GPU isn't supported?
ggml_backend_cuda_init seems to be called with device 0 as a constant. It then crashes with out of memory because it only has 8 GB, while another device contains 16.
@Nekotekina dont click that
For the benefit of anyone who finds this issue: you can use CUDA_VISIBLE_DEVICES=x to force a different GPU to be "index 0", I am running successfully on 2 GPUs just have to load balance externally.
@the-crypt-keeper is there any way to make this work with multi-gpu on Vulkan?
@evcharger , The Vulkan equivalent is GGML_VK_VISIBLE_DEVICES; for instance, GGML_VK_VISIBLE_DEVICES=1 will make only the second card available to sd.cpp.
I have mutliple GPUs, that only have enough VRAM together. Is there some way to load different layers on different cards? Or use one GPU but only have a small number of layers in RAM?