ipex-llm
ipex-llm copied to clipboard
Test cpu offload
Description
Refer to https://github.com/analytics-zoo/vllm/blob/xiangyu_test_202411_0806/vllm/model_executor/models/utils.py#L76, enable not load model to xpu. Performance: Qwen1.5-32B 4card fp8 9k-512:
0.5.4 | Next Token(ms) |
---|---|
before | 73.26 |
offload 1GB | 799.36 |
offload 2GB | 1297.23 |
It seems that the cpu offload function is not available.
1. Why the change?
2. User API changes
3. Summary of the change
4. How to test?
- [ ] N/A
- [ ] Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g.,
1234
). And paste your action link here once it has been successfully finished. - [ ] Application test
- [ ] Document test
- [ ] ...
5. New dependencies
- [ ] New Python dependencies - Dependency1 - Dependency2 - ...
- [ ] New Java/Scala dependencies and their license - Dependency1 and license1 - Dependency2 and license2 - ...