promptable icon indicating copy to clipboard operation
promptable copied to clipboard

adding exp backoff retry decorator to openai embedding and completion calls

Open cfortuner opened this issue 2 years ago • 5 comments

Adds a new utility -> Retry!

  • Updated Openai provider's embedOne, generate and stream methods to use Retry!

It's a typescript decorator that lets you retry N number of times:

  @retry(3)
  async generate(
    promptText: string,
    options: GenerateCompletionOptions = DEFAULT_COMPLETION_OPTIONS
  ) {
    try {
      if (options.s

Let me know what you think!

cfortuner avatar Feb 14 '23 18:02 cfortuner

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
docs-promptable ❌ Failed (Inspect) Feb 14, 2023 at 7:26PM (UTC)

vercel[bot] avatar Feb 14 '23 18:02 vercel[bot]

Another option for retry logic here that is probably more performant / better supported is the async-retry library.

Here's an example implementation from my codebase, wrapping the openai SDK:

import retry from "async-retry";

export const openaiCompletion = trace("openaiCompletion", _openaiCompletion);
async function _openaiCompletion(prompt: string, model: string = "text-davinci-003", temperature: number = 1, nTokens: number = 500): Promise<string> {
    const response = await retry(
        async (bail) => {
            return openai.createCompletion({
                model: model,
                prompt,
                temperature: temperature,
                max_tokens: nTokens,
                top_p: 1,
                frequency_penalty: 0,
                presence_penalty: 0
            })
        },
        {
            retries: 8,
            factor: 4,
            minTimeout: 1000,
            // onRetry: (error: any) => console.log(error)
        }
    )
    const text = response.data.choices[0].text
    return text!
}

Thoughts:

  • Tracing this with promptable is useful, as an aside.
  • Lets you pass an arbitrary error handler callback (not wired up here)
  • Retry logic (ex. retries and factor) can be parameterized and presented to the user to turn the knobs
  • Not a lot of overhead to adding, just an extra import.

I would probably opt to add this at the base ModelProvider level, and I think it's worth considering implementing this as a function decorator s.t. the retry logic can be added without too much additional boilerplate. That said, thinking deeply about retry logic on an api-specific basis is a real good idea because what is good for OpenAI APIs might not hold for other service providers.

yourbuddyconner avatar Feb 15 '23 22:02 yourbuddyconner

Just updating here. Holding off on adding this for now,

we have some other ideas that we'd like to try that might be better.

cfortuner avatar Feb 16 '23 02:02 cfortuner

Cool @cfortuner lmk if you want me to review the solution when you have a PR.

yourbuddyconner avatar Feb 16 '23 04:02 yourbuddyconner

Having read @mathisobadia's comments, I think that makes more sense. There is a trade-off here:

  • In batching, we save on request counts, so less requests toward requests per minute rate limit.
  • In one by one, we save time in processing because we do not lose time when a batch is rejected, but this means there are a lot of requests toward requests per minute rate limit.

One more thing to consider is how to handle embedding requests that take more than 250k tokens per minute. If we do batching, we have to construct the array to remain below that. But in one by one, as one embedding request cannot exceed model max token length, we are safe. Even if the total of those exceed 250k/m, the retry policies would handle that.

So I'd say reducing the time needed to process embeddings and being able to handle large text processing is more important (at least for my case). But maybe we can have processing policies to handle both.

ymansurozer avatar Feb 16 '23 11:02 ymansurozer