outlines icon indicating copy to clipboard operation
outlines copied to clipboard

chat/completions endpoint with structured generation support

Open Imagineer99 opened this issue 1 year ago • 2 comments

New Feature: chat/completions style endpoint with structured generation support.

Background

When serving outlines with vLLM to interact with an HTTP library, currently only the /generate endpoint is available. However, there's a need for a chat/completions equivalent that supports structured generation and streaming.

Proposed Solution

Implement OpenAI compatible endpoint functionality with special handling for the metadata object, specifically using a key called structure. This approach would allow:

  1. Structuring inputs like a conversation with alternating user messages and assistant responses.
  2. Having the next response use structured generation.
  3. Streaming the output, so users don't receive the full completion at once and have to construct the chat history manually.

Implementation Details

  • Utilize the OpenAI API's metadata object functionality.
  • Add special handling for a structure key within the metadata object.
  • Implement streaming support for the structured output.

Benefits

  • Improved compatibility with chat-based applications.
  • Enhanced user experience through streaming responses.
  • Easier integration for developers familiar with OpenAI's chat/completions API.

Resources

  • OpenAI metadata usage: https://community.openai.com/t/how-does-the-assistant-api-use-the-metadata-field/481096
  • OpenAI API reference for metadata: https://platform.openai.com/docs/api-reference/batch/create#batch-create-metadata

Next Steps

  • Discuss the feasibility and design of this feature.
  • Outline specific implementation steps.
  • Assign developers to work on the feature (Lee has offered to contribute if time allows).

Related Discussions

https://discord.com/channels/1182316225284554793/1182592312669372427/1260988449238814802


Please feel free to provide any feedback or suggestions to improve this proposal.

Imagineer99 avatar Jul 15 '24 15:07 Imagineer99

Is this resolved by https://github.com/vllm-project/vllm/pull/7654

lapp0 avatar Sep 14 '24 19:09 lapp0

Is this resolved by vllm-project/vllm#7654

Looks like it does!

Imagineer99 avatar Sep 14 '24 19:09 Imagineer99