ipex-llm
ipex-llm copied to clipboard
LLM: split chatglm3's mlp and use mlp fusion
Description
This split chatglm3's mlp and use mlp fusion, which can has ~1ms on MTL. But quantize kv cache + mlp fusion will cause change of output on Arc & MTL (which seems a known issue)
1. Why the change?
2. User API changes
3. Summary of the change
4. How to test?
- [ ] Unit test