ktransformers intel gpu推理速度

想了解一下intel gpu配置对应的deepseek模型推理速度，作为参考。谢谢！

May 15 '25 08:05 zhaoyukoon

性能好像跟cpu/mem的配置都有点关系，手头一台第五代至强+一张A770，Deepseek R1 Q4KM 在 local_chat（cpuinfer 53）模式下，差不多能跑到 7-8 tps，感觉也够用了

May 16 '25 11:05 Thoughtlesstime

性能好像跟cpu/mem的配置都有点关系，手头一台第五代至强+一张A770，Deepseek R1 Q4KM 在 local_chat（cpuinfer 53）模式下，差不多能跑到 7-8 tps，感觉也够用了

能详细说一下机器的配置吗？

May 16 '25 13:05 tsdcz

我至强6代+2 intel Arc A770 跑满血版Q4量化速度是1.9~2.9 t/s 线程44 ，不知道大家是不是跟我一样

Jun 18 '25 00:06 drew-ye

我至强6代+2 intel Arc A770 跑满血版Q4量化速度是1.9~2.9 t/s 线程44 ，不知道大家是不是跟我一样

What's your OS distro and version? and your memory size and speed? Different software component versions could cause different performance. You can try xpu docker to see any difference.

Jun 18 '25 02:06 aubreyli

我至强6代+2 intel Arc A770 跑满血版Q4量化速度是1.9~2.9 t/s 线程44 ，不知道大家是不是跟我一样

我平均测下来速度比你还慢，只有 1.6token/s ... 我用docker部署跑的unsloth/DeepSeek-R1-0528-Q4_K_M_GGuf，只能用上单卡A770，你两张卡能都用上么？ cpu是 Intel(R) Xeon(R) w5-3525 32核，512g ddr5 4800

Jul 15 '25 05:07 icm-ai

Please refer to issue #1329 , according to your hardware configuration, your decode speed should be around 5 to 6 tokens per second

Jul 15 '25 06:07 aubreyli

我至强6代+2 intel Arc A770 跑满血版Q4量化速度是1.9~2.9 t/s 线程44 ，不知道大家是不是跟我一样

我平均测下来速度比你还慢，只有 1.6token/s ... 我用docker部署跑的unsloth/DeepSeek-R1-0528-Q4_K_M_GGuf，只能用上单卡A770，你两张卡能都用上么？ cpu是 Intel(R) Xeon(R) w5-3525 32核，512g ddr5 4800

更正一下，只有16核...速度确实很慢，Q4_K_M 极限也就2+

Jul 23 '25 02:07 icm-ai