ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

LLM: Optimize qwen1.5 moe model

Open hzjane opened this issue 10 months ago • 0 comments

Description

  • [x] convert moe block
  • [x] RMSNorm llama_rms_norm_forward
  • [x] MLP llama_mlp_forward
  • [x] kv_cache transformers-v4.40.4 past_key_value.seen_tokens -> past_key_value._seen_tokens
  • [x] fused qkv
  • [x] fused rope
  • [x] flash_attention(first_token)
  • [ ] esimd_sdp(next_token)
  • [ ] quantize kv cache

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

  • [ ] N/A
  • [ ] Unit test
  • [ ] Application test
  • [ ] Document test
  • [ ] ...

5. New dependencies

  • [ ] New Python dependencies - Dependency1 - Dependency2 - ...
  • [ ] New Java/Scala dependencies and their license - Dependency1 and license1 - Dependency2 and license2 - ...

hzjane avatar Apr 09 '24 05:04 hzjane