ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

[LLM] Add quantize_kv optimization for yuan2 model

Open sgwhat opened this issue 1 year ago • 1 comments

Description

Add quantize_kv to optimize the performance of yuan2 model on mtl.

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

  • [ ] N/A
  • [ ] Unit test
  • [ ] Application test
  • [ ] Document test
  • [ ] ...

sgwhat avatar Feb 26 '24 07:02 sgwhat

The Yuan2 model requires quantize_kv support for the head_dim of 64, which only supports 128 for now. I am currently working on it.

sgwhat avatar Feb 26 '24 07:02 sgwhat

Fix style please.

qiuxin2012 avatar Feb 27 '24 07:02 qiuxin2012