[XPU]Support enable_logprob
Motivation
This PR primarily adds Logprobs support for XPU (Kunlun Chip) on the FastDeploy LLM inference engine.
Previously, Logprobs functionality was restricted only to CUDA platforms, which prevented users from leveraging advanced sampling features on XPU devices.
Modifications
This PR involves changes across configuration, worker logic, and the custom XPU operators
Usage or Command
This feature is enabled automatically when running on XPU and setting logprobs: true in the API request payload:
export XPU_VISIBLE_DEVICES="0"
python -m fastdeploy.entrypoints.openai.api_server
--model /work/PaddlePaddle/ERNIE-4.5-0.3B-Paddle
--port 8188
--tensor-parallel-size 1
--max-model-len 32768
--max-num-seqs 128
--quantization "wint8"
--gpu-memory-utilization 0.9
--enable-logprob
curl -X POST "http://0.0.0.0:8188/v1/chat/completions"
-H "Content-Type: application/json"
-d '{
"messages": [
{"role": "user", "content": "Hello! Please tell me a short story."}
],
"logprobs": true,
"top_logprobs": 5,
"max_tokens": 50
}'
Accuracy Tests
This change affects the Logprobs output structure and platform support, not the core inference results. Logprobs are verified for correctness when comparing with CUDA results on the same model and input.
Checklist
[X] Add at least a tag in the PR title. - Tag list: [[XPU], [Feature], [OP], [BugFix]] - You can add new tags based on the PR content, but the semantics must be clear.
[X] Format your code, run pre-commit before commit. (All checks passed after running pre-commit multiple times to fix formatting and clang-format issues.)
[X] Add unit tests. Please write the reason in this PR if no unit tests. (Unit tests added/modified to cover args_utils.py platform check and token_processor.py logprobs logic.)
[ ] Provide accuracy results. (N/A, feature enablement)
[ ] If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.
Thanks for your contribution!
Codecov Report
:x: Patch coverage is 61.53846% with 5 lines in your changes missing coverage. Please review.
:warning: Please upload report for BASE (develop@35f85ba). Learn more about missing BASE report.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| fastdeploy/output/token_processor.py | 63.63% | 4 Missing :warning: |
| fastdeploy/engine/args_utils.py | 0.00% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## develop #5190 +/- ##
==========================================
Coverage ? 60.52%
==========================================
Files ? 320
Lines ? 39059
Branches ? 5871
==========================================
Hits ? 23639
Misses ? 13554
Partials ? 1866
| Flag | Coverage Δ | |
|---|---|---|
| GPU | 60.52% <61.53%> (?) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.