FastDeploy icon indicating copy to clipboard operation
FastDeploy copied to clipboard

[XPU]Support enable_logprob

Open qw86972190 opened this issue 1 month ago • 2 comments

Motivation

This PR primarily adds Logprobs support for XPU (Kunlun Chip) on the FastDeploy LLM inference engine.

Previously, Logprobs functionality was restricted only to CUDA platforms, which prevented users from leveraging advanced sampling features on XPU devices.

Modifications

This PR involves changes across configuration, worker logic, and the custom XPU operators

Usage or Command

This feature is enabled automatically when running on XPU and setting logprobs: true in the API request payload: export XPU_VISIBLE_DEVICES="0" python -m fastdeploy.entrypoints.openai.api_server
--model /work/PaddlePaddle/ERNIE-4.5-0.3B-Paddle
--port 8188
--tensor-parallel-size 1
--max-model-len 32768
--max-num-seqs 128
--quantization "wint8"
--gpu-memory-utilization 0.9
--enable-logprob

curl -X POST "http://0.0.0.0:8188/v1/chat/completions"
-H "Content-Type: application/json"
-d '{ "messages": [ {"role": "user", "content": "Hello! Please tell me a short story."} ], "logprobs": true, "top_logprobs": 5, "max_tokens": 50 }'

Accuracy Tests

This change affects the Logprobs output structure and platform support, not the core inference results. Logprobs are verified for correctness when comparing with CUDA results on the same model and input.

Checklist

[X] Add at least a tag in the PR title.   - Tag list: [[XPU], [Feature], [OP], [BugFix]]   - You can add new tags based on the PR content, but the semantics must be clear.

[X] Format your code, run pre-commit before commit. (All checks passed after running pre-commit multiple times to fix formatting and clang-format issues.)

[X] Add unit tests. Please write the reason in this PR if no unit tests. (Unit tests added/modified to cover args_utils.py platform check and token_processor.py logprobs logic.)

[ ] Provide accuracy results. (N/A, feature enablement)

[ ] If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

qw86972190 avatar Nov 24 '25 07:11 qw86972190

Thanks for your contribution!

paddle-bot[bot] avatar Nov 24 '25 07:11 paddle-bot[bot]

Codecov Report

:x: Patch coverage is 61.53846% with 5 lines in your changes missing coverage. Please review. :warning: Please upload report for BASE (develop@35f85ba). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/output/token_processor.py 63.63% 4 Missing :warning:
fastdeploy/engine/args_utils.py 0.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5190   +/-   ##
==========================================
  Coverage           ?   60.52%           
==========================================
  Files              ?      320           
  Lines              ?    39059           
  Branches           ?     5871           
==========================================
  Hits               ?    23639           
  Misses             ?    13554           
  Partials           ?     1866           
Flag Coverage Δ
GPU 60.52% <61.53%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Nov 24 '25 11:11 codecov-commenter