ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

LLM: split chatglm3's mlp and use mlp fusion

Open rnwang04 opened this issue 11 months ago • 0 comments

Description

This split chatglm3's mlp and use mlp fusion, which can has ~1ms on MTL. But quantize kv cache + mlp fusion will cause change of output on Arc & MTL (which seems a known issue)

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

  • [ ] Unit test

rnwang04 avatar Mar 26 '24 04:03 rnwang04