Support Buffers, Blobs, or Streams inside experimental_streamData, not just JSON keys.

Open ChristopherTrimboli opened this issue 2 years ago • 0 comments

Feature Description

Not sure the technical constraints, maybe impossible... but I'll show my use case that would be heavily improved and my bottleneck I ran into.

I'm using PlayHT AI audio and want to attach the audio data alongside the text. Latency is important, I want to do everything at once, inside the stream.

The major line in question is:

      data.append({
        voiceData: Buffer.from(await resp.arrayBuffer()).toString("base64"),
      });

You can see how I'm hacking a Buffer, then I decode back to audio on frontend client side because data only supports JSON values.

Some may say, use blob storage... I tried writing to vercel blob instead and pass URL, but I found base64 was still faster. Ideally, no conversions... I am able to send a Blob or Buffer directly in data would be very cool!

Here is an example of my API:

export async function POST(req: Request) {
  // Extract the `messages` from the body of the request
  const { messages, personaName } = await req.json();

  // Request the OpenAI API for the response based on the prompt
  const aiResponse = await openai.chat.completions.create({
    model: "gpt-3.5-turbo",
    stream: true,
    messages: messages,
  });

  const data = new experimental_StreamData();

  const persona = await prisma.persona.findFirst({
    where: { name: personaName },
  });

  const stream = OpenAIStream(aiResponse, {
    onFinal: async (completion) => {
      const voicesFiltered = voices.filter(
        (v) =>
          v.voice_engine === "PlayHT2.0" &&
          v.gender === persona?.gender &&
          v.accent === persona?.accent
      );

      const resp = await fetch("https://api.play.ht/api/v2/tts/stream", {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          AUTHORIZATION: `${process.env.PLAYHT_SECRET_KEY}`,
          "X-USER-ID": process.env.PLAYHT_USER_ID!,
          accept: "audio/mpeg",
        },
        body: JSON.stringify({
          text: completion,
          voice:
            persona?.voiceId ??
            voicesFiltered[Math.floor(Math.random() * voicesFiltered.length)]
              .id,
          output_format: "mp3",
          voice_engine: "PlayHT2.0-turbo",
        }),
      }).catch((err) => console.log("fetch error:", err));

      if (!resp) return;
      
      
     // hack here to get around JSON keys
      data.append({
        voiceData: Buffer.from(await resp.arrayBuffer()).toString("base64"),
      });

      // IMPORTANT! you must close StreamData manually or the response will never finish.
      data.close();
    },
    // IMPORTANT! until this is stable, you must explicitly opt in to supporting streamData.
    experimental_streamData: true,
  });

  // Respond with the stream
  return new StreamingTextResponse(stream, {}, data);
}

Use Case

For voice audio streaming alongside text AI responses. Probably many other Buffer uses as well people doing. Images, webcam streams, etc.

Additional context

No response

Oct 28 '23 18:10 ChristopherTrimboli