ipex-llm [LLM] Add quantize_kv optimization for yuan2 model

[LLM] Add quantize_kv optimization for yuan2 model

Open sgwhat opened this issue 1 year ago • 1 comments

Add quantize_kv to optimize the performance of yuan2 model on mtl.

Feb 26 '24 07:02 sgwhat

The Yuan2 model requires quantize_kv support for the head_dim of 64, which only supports 128 for now. I am currently working on it.

Feb 26 '24 07:02 sgwhat

Fix style please.

Feb 27 '24 07:02 qiuxin2012