feat: Support Moore Threads GPU
Moore Threads, a cutting-edge GPU startup, introduces MUSA (Moore Threads Unified System Architecture) as its foundational technology. This pull request marks the initial integration of MTGPU support into llama.cpp, leveraging MUSA's capabilities to enhance LLM inference performance.
Similar to https://github.com/ggerganov/llama.cpp/pull/1087, CUDA APIs are replaced by MUSA APIs using macros, and a new build option is added to Makefile and CMake.
# make
make GGML_MUSA=1
# CMake
cmake -B build -DGGML_MUSA=ON
cmake --build build --config Release
I also sent a PR to Ollama to integrate MTGPU to it and all the tests were performed through Ollama. Tested models are:
- tinyllama:latest (1b)
- llama3:latest (8b)
- qwen2:72b
- [x] I have read the contributing guidelines
- Self-reported review complexity:
- [ ] Low
- [x] Medium
- [ ] High
I am one of the primary llama.cpp CUDA developers. I would in principle be willing to buy a Moore Threads GPU and to test any code changes I do in order to assert that they don't break MUSA. On the Moore Threads website I only see a "Buy Now" button for the MTT S80. Would testing and performance optimization on that GPU be representative of an MTT S4000?
I am one of the primary llama.cpp CUDA developers. I would in principle be willing to buy a Moore Threads GPU and to test any code changes I do in order to assert that they don't break MUSA. On the Moore Threads website I only see a "Buy Now" button for the MTT S80. Would testing and performance optimization on that GPU be representative of an MTT S4000?
Thank you for checking out this PR! Yes, the current code changes were tested on the MTT S4000 (--cuda-gpu-arch=mp_22) and this model of GPU only ships with our data center solution. I will test the code changes on the MTT S80 (--cuda-gpu-arch=mp_21) and let you know the results.
@JohannesGaessler @slaren I've addressed most of your comments—thank you again for the review. However, two comments related to compilation remain unresolved. I am currently collaborating with our compiler team to address these issues, but it may take longer than anticipated. Are there any other concerns regarding the remaining changes?
Eventually we should move all the HIP and MUSA-specific code to its own headers.
No problem. I can start working on this.
In an earlier post you said:
Thank you for checking out this PR! Yes, the current code changes were tested on the MTT S4000 (--cuda-gpu-arch=mp_22) and this model of GPU only ships with our data center solution. I will test the code changes on the MTT S80 (--cuda-gpu-arch=mp_21) and let you know the results.
Have there been any updates on this?
In an earlier post you said:
Thank you for checking out this PR! Yes, the current code changes were tested on the MTT S4000 (--cuda-gpu-arch=mp_22) and this model of GPU only ships with our data center solution. I will test the code changes on the MTT S80 (--cuda-gpu-arch=mp_21) and let you know the results.
Have there been any updates on this?
I've encountered some compilation issues on S80 toolchain and have opened several internal tickets to the compiler team. I'll monitor the progress and keep you updated.
The S80 toolchain (rc2.1.0_Intel_CPU_Ubuntu_quyuan) I used is publicly available but still in the RC stage. Please refer to link
make error
ubuntu 20.04.06 LTS musa driver 2.7.0 SDK: MUSA+SDK-rc2.1.0_Intel_CPU_Ubuntu_chunxiao
make GGML_MUSA=1
Please help me.
My video card is s80,cpu is amd 2600
make error
ubuntu 20.04.06 LTS musa driver 2.7.0 SDK: MUSA+SDK-rc2.1.0_Intel_CPU_Ubuntu_chunxiao
make GGML_MUSA=1
Please help me.
Please see the above comments:
I've encountered some compilation issues on S80 toolchain and have opened several internal tickets to the compiler team. I'll monitor the progress and keep you updated.
The S80 toolchain (rc2.1.0_Intel_CPU_Ubuntu_quyuan) I used is publicly available but still in the RC stage. Please refer to link
We are still investigating this issue internally. Please expect a new release of MUSA SDK and llama.cpp PR.
make error ubuntu 20.04.06 LTS musa driver 2.7.0 SDK: MUSA+SDK-rc2.1.0_Intel_CPU_Ubuntu_chunxiao make GGML_MUSA=1
Please help me.
Please see the above comments:
I've encountered some compilation issues on S80 toolchain and have opened several internal tickets to the compiler team. I'll monitor the progress and keep you updated. The S80 toolchain (rc2.1.0_Intel_CPU_Ubuntu_quyuan) I used is publicly available but still in the RC stage. Please refer to link
We are still investigating this issue internally. Please expect a new release of MUSA SDK and llama.cpp PR.
any progress in s80?
make error ubuntu 20.04.06 LTS musa driver 2.7.0 SDK: MUSA+SDK-rc2.1.0_Intel_CPU_Ubuntu_chunxiao make GGML_MUSA=1
Please help me.
Please see the above comments:
I've encountered some compilation issues on S80 toolchain and have opened several internal tickets to the compiler team. I'll monitor the progress and keep you updated. The S80 toolchain (rc2.1.0_Intel_CPU_Ubuntu_quyuan) I used is publicly available but still in the RC stage. Please refer to link
We are still investigating this issue internally. Please expect a new release of MUSA SDK and llama.cpp PR.
any progress in s80?
I guess we'll have to wait for the next version of the SDK
I guess we'll have to wait for the next version of the SDK
Yes, please give us more time.
@yeahdongcn please help - have got the problem with compiling llama.cpp using MUSA SDK rc2.0.0 with Ubuntu 20.04.6 LTS:
running make GGML_MUSA=1 shows the following error:
Is there something I am doing wrong for the compilation?
@Ivening We are still working on MTT S80 support, please see: https://github.com/ggerganov/llama.cpp/pull/9526
If you are interested in running llama.cpp on MTT S80, please add me through WeChat: yeahdongcn.
@yeahdongcn thank you for your reply! Will this code work with MTT S3000?
@yeahdongcn thank you for your reply! Will this code work with MTT S3000?
Haha, it seems that you're one of our business customers! MTT S3000 shares the same architecture as MTT S80, I can test on MTT S3000 as well.
@yeahdongcn what speeds can we expect for ~8B models for the MTT S80?
@yeahdongcn what speeds can we expect for ~8B models for the MTT S80?
~15 tokens/s (llama3.1:8b)
@yeahdongcn very good, thank you
Please help me.