ipex-llm LLM: split chatglm3's mlp and use mlp fusion

LLM: split chatglm3's mlp and use mlp fusion

Open rnwang04 opened this issue 11 months ago • 0 comments

Description

This split chatglm3's mlp and use mlp fusion, which can has ~1ms on MTL. But quantize kv cache + mlp fusion will cause change of output on Arc & MTL (which seems a known issue)

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

[ ] Unit test

Mar 26 '24 04:03 rnwang04

ipex-llm ipex-llm copied to clipboard

LLM: split chatglm3's mlp and use mlp fusion

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

ipex-llm
ipex-llm copied to clipboard