llama.cpp [Draft] Tensor Parallel support to llama.cpp

[x] I have read the contributing guidelines
Self-reported review complexity:
- [ ] Low
- [ * ] Medium
- [ ] High Add tensor parallel support to llama.cpp, still draft code now.

Sep 26 '24 02:09 ClarkChin08

https://github.com/ggerganov/llama.cpp/issues/9086 Refer to this issue for detailed design.

Sep 26 '24 02:09 ClarkChin08

@ClarkChin08 It's great to see this feature is implemented.

Is it possible to update the guide/doc to explain how to use this feature:

how to enable it.
what's the benefit.
which case should use this feature.
update the installation for dependent package (oneCCL, MPI) in oneAPI.

Thank you!

Sep 26 '24 10:09 NeoZhangJianyu

hello - was this feature completed?

Jan 07 '25 23:01 ehartford

@ClarkChin08, hello - was this feature completed?

Jan 12 '25 00:01 lexasub

Hi, thanks and appreciate the work. It would be great to have this feature added/completed, which will bring great performance for multi gpu setup, similar to what vllm already has.

Apr 19 '25 18:04 Neko-Box-Coder

This looks really interesting! Having tp support like vllm does would bring some great speed ups!

Apr 30 '25 22:04 AbdullahMPrograms

looking forward to having this feature.

May 14 '25 12:05 zacksiri

Just a bump, this feature would be really great for the community.

May 31 '25 13:05 zacksiri

I suspect the OP has abandoned development and this feature is incomplete.

Jun 27 '25 22:06 aidendle94

There's actually some much more recent progress on this in https://github.com/ggml-org/llama.cpp/pull/13818 and https://github.com/ggml-org/llama.cpp/pull/13776, but it's not ready yet.

Jun 28 '25 01:06 netrunnereve