ktransformers icon indicating copy to clipboard operation
ktransformers copied to clipboard

[Bug] Why DUAL-CPU has no speed up?

Open mrgaolei opened this issue 2 months ago • 2 comments

Checklist

  • [x] 1. I have searched related issues but cannot get the expected help.
  • [x] 2. The bug has not been fixed in the latest version.
  • [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/kvcache-ai/ktransformers/discussions. Otherwise, it will be closed.
  • [x] 5. To help the community, I will use Chinese/English or attach an Chinese/English translation if using another language. Non-Chinese/English content without translation may be closed.

Describe the bug

DUAL-CPU's decode token speed is equal to Single-CPU.

Reproduction

When I bought a new dual-CPU server with AMX support, and install with libnuma-dev and USE_NUMA=1, but the speed is not up. always ≈ 13 token/s.

When it working, I monitor the RAM from 1-CPU's 400G up to 2-CPU's 800G, and both CPU cores working. But the decod speed is equal than 1-CPU with 512G RAM.

Environment

CPU: 6416H * 2 RAM: 64G DDR5 * 16 = 1T GPU: RTX 3090

KTransformers tag: v0.3.2

mrgaolei avatar Oct 28 '25 02:10 mrgaolei

Is it possible that the bottleneck is the GPU now? 3090 is quite close to V100 in term of Flops. Just a guess.

Mrw33554432 avatar Nov 04 '25 09:11 Mrw33554432

Is it possible that the bottleneck is the GPU now? 3090 is quite close to V100 in term of Flops. Just a guess.

However, aren't ktransformers entirely computed by the CPU? The GPU only uses video memory. Because when I was executing decode, the CPU was at 100%, while the GPU was only at 30%, and the fan wasn't even spinning.

mrgaolei avatar Nov 05 '25 03:11 mrgaolei