ik_llama.cpp
ik_llama.cpp copied to clipboard
Is this better for multi-GPU and split mode "graph"?
I only have a 2xGPU system, so no way to test the best graph splitting strategy on a multi-GPU system. On the main branch I'm forcing a second graph split when combining partial tensor-parallel results. But this may not be the best strategy, so this PR removes the second split.
Please test with split mode "graph" on your multi-GPU system and let me know if this PR gives a better performance.