ipex-llm Test cpu offload

Test cpu offload

Open hzjane opened this issue 6 months ago • 0 comments

Description

Refer to https://github.com/analytics-zoo/vllm/blob/xiangyu_test_202411_0806/vllm/model_executor/models/utils.py#L76, enable not load model to xpu. Performance: Qwen1.5-32B 4card fp8 9k-512:

0.5.4	Next Token(ms)
before	73.26
offload 1GB	799.36
offload 2GB	1297.23

It seems that the cpu offload function is not available.

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

[ ] N/A
[ ] Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
[ ] Application test
[ ] Document test
[ ] ...

5. New dependencies

[ ] New Python dependencies - Dependency1 - Dependency2 - ...
[ ] New Java/Scala dependencies and their license - Dependency1 and license1 - Dependency2 and license2 - ...

Aug 27 '24 05:08 hzjane

ipex-llm ipex-llm copied to clipboard

Test cpu offload

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. New dependencies

ipex-llm
ipex-llm copied to clipboard