llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

feat: Support Moore Threads GPU

Open yeahdongcn opened this issue 1 year ago • 6 comments

Moore Threads, a cutting-edge GPU startup, introduces MUSA (Moore Threads Unified System Architecture) as its foundational technology. This pull request marks the initial integration of MTGPU support into llama.cpp, leveraging MUSA's capabilities to enhance LLM inference performance.

Similar to https://github.com/ggerganov/llama.cpp/pull/1087, CUDA APIs are replaced by MUSA APIs using macros, and a new build option is added to Makefile and CMake.

# make
make GGML_MUSA=1

# CMake
cmake -B build -DGGML_MUSA=ON
cmake --build build --config Release

I also sent a PR to Ollama to integrate MTGPU to it and all the tests were performed through Ollama. Tested models are:

  • tinyllama:latest (1b)
  • llama3:latest (8b)
  • qwen2:72b

yeahdongcn avatar Jul 09 '24 01:07 yeahdongcn

I am one of the primary llama.cpp CUDA developers. I would in principle be willing to buy a Moore Threads GPU and to test any code changes I do in order to assert that they don't break MUSA. On the Moore Threads website I only see a "Buy Now" button for the MTT S80. Would testing and performance optimization on that GPU be representative of an MTT S4000?

JohannesGaessler avatar Jul 09 '24 07:07 JohannesGaessler

I am one of the primary llama.cpp CUDA developers. I would in principle be willing to buy a Moore Threads GPU and to test any code changes I do in order to assert that they don't break MUSA. On the Moore Threads website I only see a "Buy Now" button for the MTT S80. Would testing and performance optimization on that GPU be representative of an MTT S4000?

Thank you for checking out this PR! Yes, the current code changes were tested on the MTT S4000 (--cuda-gpu-arch=mp_22) and this model of GPU only ships with our data center solution. I will test the code changes on the MTT S80 (--cuda-gpu-arch=mp_21) and let you know the results.

yeahdongcn avatar Jul 09 '24 07:07 yeahdongcn

@JohannesGaessler @slaren I've addressed most of your comments—thank you again for the review. However, two comments related to compilation remain unresolved. I am currently collaborating with our compiler team to address these issues, but it may take longer than anticipated. Are there any other concerns regarding the remaining changes?

yeahdongcn avatar Jul 22 '24 01:07 yeahdongcn

Eventually we should move all the HIP and MUSA-specific code to its own headers.

No problem. I can start working on this.

yeahdongcn avatar Jul 25 '24 00:07 yeahdongcn

In an earlier post you said:

Thank you for checking out this PR! Yes, the current code changes were tested on the MTT S4000 (--cuda-gpu-arch=mp_22) and this model of GPU only ships with our data center solution. I will test the code changes on the MTT S80 (--cuda-gpu-arch=mp_21) and let you know the results.

Have there been any updates on this?

JohannesGaessler avatar Jul 25 '24 09:07 JohannesGaessler

In an earlier post you said:

Thank you for checking out this PR! Yes, the current code changes were tested on the MTT S4000 (--cuda-gpu-arch=mp_22) and this model of GPU only ships with our data center solution. I will test the code changes on the MTT S80 (--cuda-gpu-arch=mp_21) and let you know the results.

Have there been any updates on this?

I've encountered some compilation issues on S80 toolchain and have opened several internal tickets to the compiler team. I'll monitor the progress and keep you updated.

The S80 toolchain (rc2.1.0_Intel_CPU_Ubuntu_quyuan) I used is publicly available but still in the RC stage. Please refer to link

yeahdongcn avatar Jul 25 '24 10:07 yeahdongcn

make error

ubuntu 20.04.06 LTS musa driver 2.7.0 SDK: MUSA+SDK-rc2.1.0_Intel_CPU_Ubuntu_chunxiao

make GGML_MUSA=1 image Please help me.

1823616178 avatar Jul 29 '24 02:07 1823616178

My video card is s80,cpu is amd 2600

1823616178 avatar Jul 29 '24 02:07 1823616178

make error

ubuntu 20.04.06 LTS musa driver 2.7.0 SDK: MUSA+SDK-rc2.1.0_Intel_CPU_Ubuntu_chunxiao

make GGML_MUSA=1 image Please help me.

Please see the above comments:

I've encountered some compilation issues on S80 toolchain and have opened several internal tickets to the compiler team. I'll monitor the progress and keep you updated.

The S80 toolchain (rc2.1.0_Intel_CPU_Ubuntu_quyuan) I used is publicly available but still in the RC stage. Please refer to link

We are still investigating this issue internally. Please expect a new release of MUSA SDK and llama.cpp PR.

yeahdongcn avatar Jul 29 '24 02:07 yeahdongcn

make error ubuntu 20.04.06 LTS musa driver 2.7.0 SDK: MUSA+SDK-rc2.1.0_Intel_CPU_Ubuntu_chunxiao make GGML_MUSA=1 image Please help me.

Please see the above comments:

I've encountered some compilation issues on S80 toolchain and have opened several internal tickets to the compiler team. I'll monitor the progress and keep you updated. The S80 toolchain (rc2.1.0_Intel_CPU_Ubuntu_quyuan) I used is publicly available but still in the RC stage. Please refer to link

We are still investigating this issue internally. Please expect a new release of MUSA SDK and llama.cpp PR.

any progress in s80?

XenoAmess avatar Aug 29 '24 07:08 XenoAmess

make error ubuntu 20.04.06 LTS musa driver 2.7.0 SDK: MUSA+SDK-rc2.1.0_Intel_CPU_Ubuntu_chunxiao make GGML_MUSA=1 image Please help me.

Please see the above comments:

I've encountered some compilation issues on S80 toolchain and have opened several internal tickets to the compiler team. I'll monitor the progress and keep you updated. The S80 toolchain (rc2.1.0_Intel_CPU_Ubuntu_quyuan) I used is publicly available but still in the RC stage. Please refer to link

We are still investigating this issue internally. Please expect a new release of MUSA SDK and llama.cpp PR.

any progress in s80?

I guess we'll have to wait for the next version of the SDK

1823616178 avatar Aug 29 '24 09:08 1823616178

I guess we'll have to wait for the next version of the SDK

Yes, please give us more time.

yeahdongcn avatar Aug 30 '24 00:08 yeahdongcn

@yeahdongcn please help - have got the problem with compiling llama.cpp using MUSA SDK rc2.0.0 with Ubuntu 20.04.6 LTS: running make GGML_MUSA=1 shows the following error: Screenshot 2024-09-19 141040 Is there something I am doing wrong for the compilation?

Ivening avatar Sep 19 '24 11:09 Ivening

@Ivening We are still working on MTT S80 support, please see: https://github.com/ggerganov/llama.cpp/pull/9526

If you are interested in running llama.cpp on MTT S80, please add me through WeChat: yeahdongcn.

yeahdongcn avatar Sep 19 '24 12:09 yeahdongcn

@yeahdongcn thank you for your reply! Will this code work with MTT S3000?

Ivening avatar Sep 19 '24 12:09 Ivening

@yeahdongcn thank you for your reply! Will this code work with MTT S3000?

Haha, it seems that you're one of our business customers! MTT S3000 shares the same architecture as MTT S80, I can test on MTT S3000 as well.

yeahdongcn avatar Sep 20 '24 00:09 yeahdongcn

@yeahdongcn what speeds can we expect for ~8B models for the MTT S80?

arch-btw avatar Oct 22 '24 06:10 arch-btw

@yeahdongcn what speeds can we expect for ~8B models for the MTT S80?

~15 tokens/s (llama3.1:8b)

Please also see the recording on llama3.2:1b: asciicast

yeahdongcn avatar Oct 22 '24 08:10 yeahdongcn

@yeahdongcn very good, thank you

arch-btw avatar Oct 22 '24 14:10 arch-btw