FastDeploy [XPU]Support enable

Motivation

This PR primarily adds Logprobs support for XPU (Kunlun Chip) on the FastDeploy LLM inference engine.

Previously, Logprobs functionality was restricted only to CUDA platforms, which prevented users from leveraging advanced sampling features on XPU devices.

Modifications

This PR involves changes across configuration, worker logic, and the custom XPU operators

Usage or Command

This feature is enabled automatically when running on XPU and setting logprobs: true in the API request payload: export XPU_VISIBLE_DEVICES="0" python -m fastdeploy.entrypoints.openai.api_server
--model /work/PaddlePaddle/ERNIE-4.5-0.3B-Paddle
--port 8188
--tensor-parallel-size 1
--max-model-len 32768
--max-num-seqs 128
--quantization "wint8"
--gpu-memory-utilization 0.9
--enable-logprob

curl -X POST "http://0.0.0.0:8188/v1/chat/completions"
-H "Content-Type: application/json"
-d '{ "messages": [ {"role": "user", "content": "Hello! Please tell me a short story."} ], "logprobs": true, "top_logprobs": 5, "max_tokens": 50 }'

Accuracy Tests

This change affects the Logprobs output structure and platform support, not the core inference results. Logprobs are verified for correctness when comparing with CUDA results on the same model and input.

Checklist

[X] Add at least a tag in the PR title. - Tag list: [[XPU], [Feature], [OP], [BugFix]] - You can add new tags based on the PR content, but the semantics must be clear.

[X] Format your code, run pre-commit before commit. (All checks passed after running pre-commit multiple times to fix formatting and clang-format issues.)

[X] Add unit tests. Please write the reason in this PR if no unit tests. (Unit tests added/modified to cover args_utils.py platform check and token_processor.py logprobs logic.)

[ ] Provide accuracy results. (N/A, feature enablement)

[ ] If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Nov 24 '25 07:11 qw86972190

Thanks for your contribution!

Nov 24 '25 07:11 paddle-bot[bot]

Codecov Report

:x: Patch coverage is 61.53846% with 5 lines in your changes missing coverage. Please review. :warning: Please upload report for BASE (develop@35f85ba). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/output/token_processor.py	63.63%	4 Missing :warning:
fastdeploy/engine/args_utils.py	0.00%	1 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5190   +/-   ##
==========================================
  Coverage           ?   60.52%           
==========================================
  Files              ?      320           
  Lines              ?    39059           
  Branches           ?     5871           
==========================================
  Hits               ?    23639           
  Misses             ?    13554           
  Partials           ?     1866

Flag	Coverage Δ
GPU	`60.52% <61.53%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Nov 24 '25 11:11 codecov-commenter

[XPU]Support enable_logprob

Codecov Report