fast-llm-security-guardrails icon indicating copy to clipboard operation
fast-llm-security-guardrails copied to clipboard

[Feature Request] - streaming support for API calls

Open Sebusml opened this issue 1 year ago • 1 comments

WHAT? Add streaming support for API responses.

WHY? Improves user experience for long or slow completions.

Additional requirements

  • Support TS.
  • Support serverless JS runtimes such as Cloudflare pages and Vercel.
  • If this feature requires creating a TS client, consider returning a ReadableStream

REFERENCE OpenAI supports this with Chat Completions and the Assistants API. Reference: https://platform.openai.com/docs/api-reference/streaming

Sebusml avatar Apr 05 '24 06:04 Sebusml

As a reference, this is how I have to handle OpenAIs Stream responses in my backend and send back to frontend:

  const textStream = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      { role: 'system', content: SYSTEM_PROMT },
      { role: 'user', content: userText }
    ],
    stream: true
  }); 
  const encoder = new TextEncoder();
  return new Response(
    new ReadableStream({
      async start(controller) {
        // Logic to handle each chunk from original stream
        for await (const chunk of textStream) {
          // Get content from chunk as of OpenAI API response structure
          const message = chunk.choices[0]?.delta?.content || '';
          controller.enqueue(encoder.encode(message));
        }

        // Close the stream once all chunks are processed
        controller.close();
      },
      cancel() {
        console.log('cancel and abort');
      }
    }),
    {
      headers: {
        'cache-control': 'no-cache',
        'Content-Type': 'text/event-stream'
      }
    }
  );

Ideally, the API should return a ReadableStream, so all I would need to do is wrap it into a response.

Sebusml avatar Apr 05 '24 06:04 Sebusml