server icon indicating copy to clipboard operation
server copied to clipboard

feat: Add guided decoding support to OpenAI frontend

Open pei0033 opened this issue 6 months ago • 0 comments

What does the PR do?

This PR adds comprehensive guided decoding support to the OpenAI frontend, enabling users to constrain model outputs to specific formats through the OpenAI-compatible API. The implementation supports both vLLM and TensorRT-LLM backends with multiple guide types including JSON schema, regex patterns, choice-based selection, and EBNF grammar.

Checklist

  • [x] I have read the Contribution guidelines and signed the Contributor License Agreement
  • [x] PR title reflects the change and is of format <commit_type>: <Title>
  • [x] Changes are described in the pull request.
  • [ ] Related issues are referenced.
  • [ ] Populated github labels field
  • [ ] Added test plan and verified test passes.
  • [ ] Verified that the PR passes existing CI.
  • [x] I ran pre-commit locally (pre-commit install, pre-commit run --all)
  • [x] Verified copyright is correct on all changed files.
  • [x] Added succinct git squash message before merging ref.
  • [ ] All template sections are filled out.
  • [ ] Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type box here and add the label to the github PR.

  • [ ] build
  • [ ] ci
  • [ ] docs
  • [x] feat
  • [ ] fix
  • [ ] perf
  • [ ] refactor
  • [ ] revert
  • [ ] style
  • [ ] test

Related PRs:

N/A

Where should the reviewer start?

Please focus on these key files:

  1. python/openai/openai_frontend/schemas/openai.py - Review the new schema fields guided_decoding_guide_type and guided_decoding_guide added to both completion request models
  2. python/openai/openai_frontend/engine/utils/triton.py - Check the implementation of guided decoding integration for both vLLM and TensorRT-LLM backends
  3. python/openai/README.md - Verify the comprehensive documentation and examples for different guide types

Test plan:

please follow the codes in README.md

Caveats:

  1. Different usage patterns per backend:
  2. Guided decoding may not function properly when used in conjunction with tool calling
  3. Currently relies on backend validation; frontend doesn't validate guide format compatibility

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

pei0033 avatar Jun 11 '25 12:06 pei0033