ai icon indicating copy to clipboard operation
ai copied to clipboard

Add `maxToolRoundtrips` option to `streamText` settings

Open baptisteArno opened this issue 1 year ago • 9 comments

Feature Description

Would be similar to how this setting works with generateText.

Use Case

In the examples, when streaming with a tool, it's the client (useChat) that makes the request again after a tool is being called. It would be great if that is automatically handled by the server.

Additional context

No response

baptisteArno avatar Jun 13 '24 11:06 baptisteArno

it's planned. the integration with useChat makes this significantly more complex tho

lgrammel avatar Jun 13 '24 11:06 lgrammel

Can I help?

Or is there any known workaround to handle that on the server?

baptisteArno avatar Jun 13 '24 12:06 baptisteArno

I'd also be curious to hear if you have a workaround in mind @lgrammel / others.

Recursion is what I spent yesterday on, and I really could not decide what I needed to do based on the source and feel like I wasn't just applying band-aids that get washed out by the next update when you guys do push this change.

I'm essentially wanting to use the streamUI itself as an async generator; any thoughts?

shaded-blue avatar Jun 15 '24 17:06 shaded-blue

Workaround for now (simplified example), wrapping the readable stream into another one:

const maxToolCalls = 5

export const runOpenAIChatCompletionStream = async ({
  credentials: { apiKey },
  options,
  variables,
  config: openAIConfig,
  compatibility,
  totalToolCalls = 0,
  toolResults,
  toolCalls,
}: Props) => {
  const response = await streamText({
    model,
    temperature: options.temperature ? Number(options.temperature) : undefined,
    messages: await parseChatCompletionMessages({
      options,
      variables,
      toolCalls,
      toolResults,
    }),
    tools: parseTools({ tools: options.tools, variables }),
  })


  return new ReadableStream({
    async start(controller) {
      const reader = response.toAIStream().getReader()

      async function pump(reader: ReadableStreamDefaultReader<Uint8Array>) {
        const { done, value } = await reader.read()

        if (done) {
          toolCalls = (await response.toolCalls) as ToolCallPart[]
          toolResults = (await response.toolResults) as
            | ToolResultPart[]
            | undefined
          return
        }

        controller.enqueue(value)
        return pump(reader)
      }

      await pump(reader)

      if (toolCalls && toolCalls.length > 0 && totalToolCalls < maxToolCalls) {
        totalToolCalls += 1
        const newReader = await runOpenAIChatCompletionStream({
          credentials: { apiKey },
          options,
          variables,
          config: openAIConfig,
          compatibility,
          toolCalls,
          toolResults,
        })
        if (newReader) await pump(newReader.getReader())
      }

      controller.close()
    },
  })
}

Am I doing this correctly? I am not super familiar with streams. I tested out briefly and it seems to work as expected.

baptisteArno avatar Jul 15 '24 07:07 baptisteArno

Do we have any timeline for this?

Also, when client automatically makes a request, is it possible to control what is sent to the backend and if so, how?

tjazsilovsek avatar Jul 21 '24 10:07 tjazsilovsek

My current solution:

async function* streamTextWithTools(model: LanguageModel, messages: ChatMessage[], maxRounds = 5) {
  for (let round = 0; round < maxRounds; round++) {
    const result = await streamText({
      model,
      messages,
      tools: { web_search },
    })
    for await (const chunk of result.fullStream) {
      if (chunk.type === 'text-delta') {
        yield chunk
      } else if (chunk.type === 'tool-call') {
        messages.push({ role: 'assistant', content: [chunk] })
      } else if (chunk.type === 'tool-result') {
        messages.push({ role: 'tool', content: [chunk] })
      } else if (chunk.type === 'error') {
        throw chunk.error
      } else if (chunk.type === 'finish' && chunk.finishReason !== 'tool-calls') {
        return
      }
    }
  }
}

wong2 avatar Jul 21 '24 10:07 wong2

@wong2 how do you than convert data to get it to work with useChat hook on frontend? only option i see is to copy the logic from the core package?

tjazsilovsek avatar Aug 01 '24 18:08 tjazsilovsek

@wong2 how do you than convert data to get it to work with useChat hook on frontend? only option i see is to copy the logic from the core package?

I'm not using useChat on frontend.

wong2 avatar Aug 02 '24 02:08 wong2

i cannot use useChat personally, because i use Tauri which is client-side nextjs only (unless reimplementing stuff in rust)

my hack:

await generateText({
  model: provider,
  tools: {
    suggest_queries: {
      description: `Suggest queries for the user's question and ask for confirmation. Example: 
        {
          suggested_queries: [
            { content_type: "audio", start_time: "2024-03-01T00:00:00Z", end_time: "2024-03-01T23:59:59Z", q: "screenpipe" },
            { content_type: "ocr", app_name: "arc", start_time: "2024-03-01T00:00:00Z", end_time: "2024-03-01T23:59:59Z", q: "screenpipe" },
          ]
        }
        
        - q contains a single query, again, for example instead of "life plan" just use "life"
        - When using the query_screenpipe tool, respond with only the updated JSON object
        - If you return something else than JSON the universe will come to an end
        - DO NOT add \`\`\`json at the beginning or end of your response
        - Do not use '"' around your response
        - Date & time now is ${new Date().toISOString()}. Adjust start_date and end_date to properly match the user intent time range.
        `,
      parameters: z.object({
        suggested_queries: screenpipeMultiQuery,
        queries_results: z
          .array(z.string())
          .optional()
          .describe(
            "The results of the queries if called after the tool query_screenpipe"
          ),
      }),
      execute: async ({ suggested_queries }) => {
        console.log("Suggested queries:", suggested_queries);
        const confirmation = await askQuestion(
          "Are these queries good? (yes/no): "
        );
        if (confirmation.toLowerCase() === "yes") {
          return { confirmed: true, queries: suggested_queries };
        } else {
          const feedback = await askQuestion(
            "Please provide feedback or adjustments: "
          );
          return { confirmed: false, feedback };
        }
      },
    },
    query_screenpipe: {
      description:
        "Query the local screenpipe instance for relevant information.",
      parameters: screenpipeMultiQuery,
      execute: queryScreenpipeNtimes,
    },
    stream_response: {
      description:
        "Stream the final response to the user. ALWAYS FINISH WITH THIS TOOL",
      parameters: z.object({
        response: z
          .string()
          .describe("The final response to stream to the user"),
      }),
      execute: async ({ response }) => {
        const { textStream } = await streamText({
          model: provider,
          messages: [{ role: "user", content: response }],
        });
        for await (const chunk of textStream) {
          process.stdout.write(chunk);
        }
        console.log("\n");
        throw new Error("STREAM_COMPLETE");
      },
    },
  },
  toolChoice: "required",
  messages: [
    {
      role: "system",
      content: `You are a helpful assistant that uses Screenpipe to answer user questions.
      First, suggest queries to the user and ask for confirmation. If confirmed, proceed with the search.
      If not confirmed, adjust based on user feedback. Use the query_screenpipe tool to search for information,
      and then use the stream_response tool to provide the final answer to the user.
      
      Rules:
      - User's today's date is ${new Date().toISOString().split("T")[0]}
      - Use multiple queries to get more relevant results
      - If the results of the queries are not relevant, adjust the query and ask for confirmation again. Minimize user's effort.
      - ALWAYS END WITH the stream_response tool to stream the final answer to the user
      - In the suggest_queries tool, always tell the user the parameters available to you (e.g. types, etc. Zod given to you) so the user can adjust the query if needed. Suggest few other changes on the arg you used so the user has some ideas.
      - Make sure to use enough data but not too much. Usually 50k+ rows a day.
      
      `,
    },
    {
      role: "user",
      content: input,
    },
  ],
  maxToolRoundtrips: 10,
});

but i suspect this use more tokens than it should (on the final answer of generateText?)

PS: i hope you won't call the LLM police regarding my prompt engineering techniques...

louis030195 avatar Aug 10 '24 13:08 louis030195

@lgrammel Any updates on adding this feature, or it's too complex?

rostikmanko avatar Aug 14 '24 21:08 rostikmanko

Might add I'd prefer a sensible default so the behaviour matches 1:1 the one I'm getting with OpenAI SDK, otherwise this might feel like a "downgrade".

mishushakov avatar Aug 18 '24 12:08 mishushakov

WIP PR: https://github.com/vercel/ai/pull/2836

lgrammel avatar Aug 29 '24 10:08 lgrammel

Available in [email protected] https://sdk.vercel.ai/docs/ai-sdk-core/tools-and-tool-calling#example-streamtext

lgrammel avatar Aug 30 '24 12:08 lgrammel

@lgrammel Thanks for this 🙌 I found an issue with the implementation. I reported the details here

danielzohar avatar Sep 01 '24 10:09 danielzohar