ultra5 125H/ultra7 255H 在不同版本ollama/llama-cpp上的推理速度差异较大
ultra5 125H/ultra7 255H 在不同版本ollama/llama-cpp上的推理速度差异较大
6011: Ultra7 125H 6011 pro: Ultra7 255H
对比相同的qwen2.5-1.5b模型,ollama / llama-cpp 推理速度相差50%以上
@rnwang04 any comments?
你好,255H在windows 11上ollama run qwen2.5:1.5b 约 56 token/s
请用最新版本的ollama-intel-2.3.0b20250630-ubuntu.tgz :https://www.modelscope.cn/models/Intel/ollama/files 并且 clinfo | grep "Driver Version" 提供一下255H和125H驱动版本
以及llama-cpp-ipex-llm 版本的下载链接。
Hello all, I have the same ARL 255H+DDR5-5600 (GPU runtime: 25.22.33944.8 via Linux kernel 6.16.0-rc6 [Linux]) environment then test them quickly via 3 version of IPEX -LLM (for llama.cpp). The difference in test results might come from the parameter default settings of the model for inference testing…
- ollama-ipex-llm-2.3.0b20250710-ubuntu, ollama-ipex-llm-2.3.0b20250630-ubuntu -->n_seq_max=2, n_ctx =4096, n_ctx_per_seq=2048...
- deepseek-r1:7b: 15.97 tokens/s
- qwen2.5:1.5b-instruct: 47.51 tokens/s
- ollama-ipex-llm-2.2.0-ubuntu. --> n_seq_max=1, n_ctx= 2048…
- deepseek-r1:7b: 15.78 tokens/s
- qwen2.5:1.5b-instruct: 50.54 tokens/s
Please help to test again based on ollama-ipex-llm-2.3.0b20250710-ubuntu with the latest GPU runtime in ARL-H and share test result for reference...
Thanks Gary