ipex-llm CodeQwen1.5-7B-Chat performance gap between pytorch and gguf

CodeQwen1.5-7B-Chat performance gap between pytorch and gguf

Open wluo1007 opened this issue 2 weeks ago • 0 comments

Target Platform: MTL (Core ultra 7 165H)

issue: codeqwen-1_5-7b-chat-q4_k_m.gguf using ipex-llm as backend for llama.cpp has performance gap compared with pytorch.

minimum throughput requirement: >15 tokens/s

ideal throughput requirement: > 20 tokens/s

Jun 26 '24 07:06 wluo1007