feat: Add guided decoding support to OpenAI frontend

Open pei0033 opened this issue 6 months ago • 0 comments

What does the PR do?

This PR adds comprehensive guided decoding support to the OpenAI frontend, enabling users to constrain model outputs to specific formats through the OpenAI-compatible API. The implementation supports both vLLM and TensorRT-LLM backends with multiple guide types including JSON schema, regex patterns, choice-based selection, and EBNF grammar.

Checklist

[x] I have read the Contribution guidelines and signed the Contributor License Agreement
[x] PR title reflects the change and is of format <commit_type>: <Title>
[x] Changes are described in the pull request.
[ ] Related issues are referenced.
[ ] Populated github labels field
[ ] Added test plan and verified test passes.
[ ] Verified that the PR passes existing CI.
[x] I ran pre-commit locally (pre-commit install, pre-commit run --all)
[x] Verified copyright is correct on all changed files.
[x] Added succinct git squash message before merging ref.
[ ] All template sections are filled out.
[ ] Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type box here and add the label to the github PR.

[ ] build
[ ] ci
[ ] docs
[x] feat
[ ] fix
[ ] perf
[ ] refactor
[ ] revert
[ ] style
[ ] test

Related PRs:

N/A

Where should the reviewer start?

Please focus on these key files:

python/openai/openai_frontend/schemas/openai.py - Review the new schema fields guided_decoding_guide_type and guided_decoding_guide added to both completion request models
python/openai/openai_frontend/engine/utils/triton.py - Check the implementation of guided decoding integration for both vLLM and TensorRT-LLM backends
python/openai/README.md - Verify the comprehensive documentation and examples for different guide types

Test plan:

please follow the codes in README.md

Caveats:

Different usage patterns per backend:
Guided decoding may not function properly when used in conjunction with tool calling
Currently relies on backend validation; frontend doesn't validate guide format compatibility

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Jun 11 '25 12:06 pei0033