ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

Test cpu offload

Open hzjane opened this issue 6 months ago • 0 comments

Description

Refer to https://github.com/analytics-zoo/vllm/blob/xiangyu_test_202411_0806/vllm/model_executor/models/utils.py#L76, enable not load model to xpu. Performance: Qwen1.5-32B 4card fp8 9k-512:

0.5.4 Next Token(ms)
before 73.26
offload 1GB 799.36
offload 2GB 1297.23

It seems that the cpu offload function is not available.

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

  • [ ] N/A
  • [ ] Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
  • [ ] Application test
  • [ ] Document test
  • [ ] ...

5. New dependencies

  • [ ] New Python dependencies - Dependency1 - Dependency2 - ...
  • [ ] New Java/Scala dependencies and their license - Dependency1 and license1 - Dependency2 and license2 - ...

hzjane avatar Aug 27 '24 05:08 hzjane