ipex-llm
ipex-llm copied to clipboard
CodeQwen1.5-7B-Chat performance gap between pytorch and gguf
Target Platform: MTL (Core ultra 7 165H)
issue: codeqwen-1_5-7b-chat-q4_k_m.gguf using ipex-llm as backend for llama.cpp has performance gap compared with pytorch.
minimum throughput requirement: >15 tokens/s
ideal throughput requirement: > 20 tokens/s