server
server copied to clipboard
feat: Add guided decoding support to OpenAI frontend
What does the PR do?
This PR adds comprehensive guided decoding support to the OpenAI frontend, enabling users to constrain model outputs to specific formats through the OpenAI-compatible API. The implementation supports both vLLM and TensorRT-LLM backends with multiple guide types including JSON schema, regex patterns, choice-based selection, and EBNF grammar.
Checklist
- [x] I have read the Contribution guidelines and signed the Contributor License Agreement
- [x] PR title reflects the change and is of format
<commit_type>: <Title> - [x] Changes are described in the pull request.
- [ ] Related issues are referenced.
- [ ] Populated github labels field
- [ ] Added test plan and verified test passes.
- [ ] Verified that the PR passes existing CI.
- [x] I ran pre-commit locally (
pre-commit install, pre-commit run --all) - [x] Verified copyright is correct on all changed files.
- [x] Added succinct git squash message before merging ref.
- [ ] All template sections are filled out.
- [ ] Optional: Additional screenshots for behavior/output changes with before/after.
Commit Type:
Check the conventional commit type box here and add the label to the github PR.
- [ ] build
- [ ] ci
- [ ] docs
- [x] feat
- [ ] fix
- [ ] perf
- [ ] refactor
- [ ] revert
- [ ] style
- [ ] test
Related PRs:
N/A
Where should the reviewer start?
Please focus on these key files:
python/openai/openai_frontend/schemas/openai.py- Review the new schema fieldsguided_decoding_guide_typeandguided_decoding_guideadded to both completion request modelspython/openai/openai_frontend/engine/utils/triton.py- Check the implementation of guided decoding integration for both vLLM and TensorRT-LLM backendspython/openai/README.md- Verify the comprehensive documentation and examples for different guide types
Test plan:
please follow the codes in README.md
Caveats:
- Different usage patterns per backend:
- Guided decoding may not function properly when used in conjunction with tool calling
- Currently relies on backend validation; frontend doesn't validate guide format compatibility