vllm feat: add enforce_include

Currently, when streaming the usage is always null. This prevents the usage of limits per user and is a bit unexpected as without using streaming, the usage is always returned.

This can be used, if vllm is between a router, such as vllm-router or litellm and serves many users and is important to detect abuse, divide costs etc.

Essential Elements of an Effective PR Description Checklist

[x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
[ ] The test plan, such as providing test command.
[ ] The test results, such as pasting the results comparison before and after, or e2e results
[ ] (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Always returns tokens which can be used by various systems for:

Detecting abuse by users
Billing or dividing cost company internally

In addition this align the behavior with the non-streaming mode where the usage is always returned

Test Plan

Setup this commit and run vllm
Send the follow request and notice how usage is returned in the last segment

curl --request POST \
  --url https://yourhost.example.com/llm/v1/completions \
  --header 'apikey: {{token}}' \
  --header 'content-type: application/json' \
  --data '{
  "model": "qwen3-30b-a3b",
  "max_tokens": 100,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "temperature": 0.1,
  "prompt": "def write_hello():",
  "stream": true
}'

data: {"id":"cmpl-ef83ad46-8ca3-49dc-8371-790f281f60a1#8733163","object":"text_completion","created":1750142471,"model":"qwen3-30b-a3b","choices":[],"usage":{"prompt_tokens":4,"total_tokens":104,"completion_tokens":100}}

Test Result

(Optional) Documentation Update

Jun 16 '25 15:06 max-wittig

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Jun 16 '25 15:06 github-actions[bot]

Local testing blocked by https://github.com/vllm-project/vllm/issues/15985

Jun 17 '25 11:06 max-wittig

@aarnphm Thanks for the review! Let me know, if I should squash my commits or if any other changes are required!

Jun 20 '25 06:06 max-wittig

@aarnphm Thank you! Is there a place where I could put some docs for this feature?

Jun 25 '25 15:06 max-wittig

No need to, on https://docs.vllm.ai/en/latest/cli/index.html we mentioned for --help, which you already include the helpstring for it.

Jun 25 '25 16:06 aarnphm

@max-wittig hello, I used the vllm serve --enable-force-include-usage parameter, but the client request still needs to include "stream_options": {"include": true} in the request body to return usage information. If the stream_options parameter is not included, it still cannot return the usage. Is this the normal behavior for the --enable-force-include-usage parameter?

Nov 21 '25 02:11 Wfd567

@Wfd567 That is because vllm has not released a new version yet. This PR is not yet released: https://github.com/vllm-project/vllm/pull/20983

Nov 21 '25 07:11 max-wittig

vllm
vllm copied to clipboard

feat: add enforce_include_usage option

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

vllm vllm copied to clipboard

feat: add enforce_include_usage option

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

vllm
vllm copied to clipboard