ipex-llm
ipex-llm copied to clipboard
[LLM] Add quantize_kv optimization for yuan2 model
Description
Add quantize_kv to optimize the performance of yuan2 model on mtl.
1. Why the change?
2. User API changes
3. Summary of the change
4. How to test?
- [ ] N/A
- [ ] Unit test
- [ ] Application test
- [ ] Document test
- [ ] ...
The Yuan2 model requires quantize_kv
support for the head_dim
of 64, which only supports 128 for now. I am currently working on it.
Fix style please.