vllm [RFC]: Implement Structured Output support for V1 engine

Motivation.

Structured Output is supported in v0, but not yet in v1. One reason for the delay is there have been performance challenges with the integration in v0, and we'd like to rethink the integration approach. We would also like to account for supporting additional techniques, jump decoding in particular, in the future.

The document below covers the proposed integration of the Structured Output functionality in V1 of the vLLM engine.

Proposed Change.

A draft proposal can be found in this google doc: https://docs.google.com/document/d/1H6m_Y3FLJ1FYGCmjXdZzoJv-JCDSxnKuSY2XiAj-c6c/edit?tab=t.0

This content will eventually be moved into a PR as an addition to the design docs section of the vllm docs.

Related issue for closing xgrammar feature gaps: https://github.com/vllm-project/vllm/issues/12131

Feedback Period.

No response

CC List.

@mgoin @aarnphm @markmc @simon-mo @xuechendi @WoosukKwon

Any Other Things.

No response

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Jan 09 '25 23:01 russellb

https://github.com/vllm-project/vllm/pull/12388 for initial support in v1

Jan 26 '25 23:01 aarnphm

The first iteration has been merged.

Mar 10 '25 13:03 russellb