vllm icon indicating copy to clipboard operation
vllm copied to clipboard

feat: add enforce_include_usage option

Open max-wittig opened this issue 5 months ago • 3 comments

Currently, when streaming the usage is always null. This prevents the usage of limits per user and is a bit unexpected as without using streaming, the usage is always returned.

This can be used, if vllm is between a router, such as vllm-router or litellm and serves many users and is important to detect abuse, divide costs etc.

Essential Elements of an Effective PR Description Checklist

  • [x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • [ ] The test plan, such as providing test command.
  • [ ] The test results, such as pasting the results comparison before and after, or e2e results
  • [ ] (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Always returns tokens which can be used by various systems for:

  • Detecting abuse by users
  • Billing or dividing cost company internally

In addition this align the behavior with the non-streaming mode where the usage is always returned

Test Plan

  1. Setup this commit and run vllm
  2. Send the follow request and notice how usage is returned in the last segment
curl --request POST \
  --url https://yourhost.example.com/llm/v1/completions \
  --header 'apikey: {{token}}' \
  --header 'content-type: application/json' \
  --data '{
  "model": "qwen3-30b-a3b",
  "max_tokens": 100,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "temperature": 0.1,
  "prompt": "def write_hello():",
  "stream": true
}'

data: {"id":"cmpl-ef83ad46-8ca3-49dc-8371-790f281f60a1#8733163","object":"text_completion","created":1750142471,"model":"qwen3-30b-a3b","choices":[],"usage":{"prompt_tokens":4,"total_tokens":104,"completion_tokens":100}}

Test Result

(Optional) Documentation Update

max-wittig avatar Jun 16 '25 15:06 max-wittig

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

github-actions[bot] avatar Jun 16 '25 15:06 github-actions[bot]

Local testing blocked by https://github.com/vllm-project/vllm/issues/15985

max-wittig avatar Jun 17 '25 11:06 max-wittig

@aarnphm Thanks for the review! Let me know, if I should squash my commits or if any other changes are required!

max-wittig avatar Jun 20 '25 06:06 max-wittig

@aarnphm Thank you! Is there a place where I could put some docs for this feature?

max-wittig avatar Jun 25 '25 15:06 max-wittig

No need to, on https://docs.vllm.ai/en/latest/cli/index.html we mentioned for --help, which you already include the helpstring for it.

aarnphm avatar Jun 25 '25 16:06 aarnphm

@max-wittig hello, I used the vllm serve --enable-force-include-usage parameter, but the client request still needs to include "stream_options": {"include": true} in the request body to return usage information. If the stream_options parameter is not included, it still cannot return the usage. Is this the normal behavior for the --enable-force-include-usage parameter?

Wfd567 avatar Nov 21 '25 02:11 Wfd567

@Wfd567 That is because vllm has not released a new version yet. This PR is not yet released: https://github.com/vllm-project/vllm/pull/20983

max-wittig avatar Nov 21 '25 07:11 max-wittig