fast-llm-security-guardrails
fast-llm-security-guardrails copied to clipboard
[Feature Request] - streaming support for API calls
WHAT? Add streaming support for API responses.
WHY? Improves user experience for long or slow completions.
Additional requirements
- Support TS.
- Support serverless JS runtimes such as Cloudflare pages and Vercel.
- If this feature requires creating a TS client, consider returning a ReadableStream
REFERENCE OpenAI supports this with Chat Completions and the Assistants API. Reference: https://platform.openai.com/docs/api-reference/streaming
As a reference, this is how I have to handle OpenAIs Stream responses in my backend and send back to frontend:
const textStream = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{ role: 'system', content: SYSTEM_PROMT },
{ role: 'user', content: userText }
],
stream: true
});
const encoder = new TextEncoder();
return new Response(
new ReadableStream({
async start(controller) {
// Logic to handle each chunk from original stream
for await (const chunk of textStream) {
// Get content from chunk as of OpenAI API response structure
const message = chunk.choices[0]?.delta?.content || '';
controller.enqueue(encoder.encode(message));
}
// Close the stream once all chunks are processed
controller.close();
},
cancel() {
console.log('cancel and abort');
}
}),
{
headers: {
'cache-control': 'no-cache',
'Content-Type': 'text/event-stream'
}
}
);
Ideally, the API should return a ReadableStream, so all I would need to do is wrap it into a response.