outlines
outlines copied to clipboard
chat/completions endpoint with structured generation support
New Feature: chat/completions style endpoint with structured generation support.
Background
When serving outlines with vLLM to interact with an HTTP library, currently only the /generate endpoint is available. However, there's a need for a chat/completions equivalent that supports structured generation and streaming.
Proposed Solution
Implement OpenAI compatible endpoint functionality with special handling for the metadata object, specifically using a key called structure. This approach would allow:
- Structuring inputs like a conversation with alternating user messages and assistant responses.
- Having the next response use structured generation.
- Streaming the output, so users don't receive the full completion at once and have to construct the chat history manually.
Implementation Details
- Utilize the OpenAI API's
metadataobject functionality. - Add special handling for a
structurekey within themetadataobject. - Implement streaming support for the structured output.
Benefits
- Improved compatibility with chat-based applications.
- Enhanced user experience through streaming responses.
- Easier integration for developers familiar with OpenAI's chat/completions API.
Resources
- OpenAI metadata usage: https://community.openai.com/t/how-does-the-assistant-api-use-the-metadata-field/481096
- OpenAI API reference for metadata: https://platform.openai.com/docs/api-reference/batch/create#batch-create-metadata
Next Steps
- Discuss the feasibility and design of this feature.
- Outline specific implementation steps.
- Assign developers to work on the feature (Lee has offered to contribute if time allows).
Related Discussions
https://discord.com/channels/1182316225284554793/1182592312669372427/1260988449238814802
Please feel free to provide any feedback or suggestions to improve this proposal.
Is this resolved by https://github.com/vllm-project/vllm/pull/7654