Is this better for multi-GPU and split mode "graph"?

Open ikawrakow opened this issue 3 weeks ago • 0 comments

I only have a 2xGPU system, so no way to test the best graph splitting strategy on a multi-GPU system. On the main branch I'm forcing a second graph split when combining partial tensor-parallel results. But this may not be the best strategy, so this PR removes the second split.

Please test with split mode "graph" on your multi-GPU system and let me know if this PR gives a better performance.

Dec 02 '25 08:12 ikawrakow