ai Implement Custom Retry Callback for Error-Specific Retries and Exponential Backoff

trafficstars

Feature Description

I propose a parameter that allows configuring retry behavior via a callback function. This function should receive the error and retry count and returns the number of milliseconds to wait until the next retry. Instead of returning milliseconds, the user could throw to cancel retries.

Example:

let { text } = await generateText({
  model: yourModel,
  prompt,
  onRetry: ({ retry, error }) => {
    if (retry === 5) throw error; // maximum 5 retries
    if (!APICallError.isInstance(e) || e.statusCode !== 429) throw error; // only retry specific errors
    return Math.min(100 * 2 ** retry, 2000); // exponential backoff
  },
});

Use Cases

I've had Google returning 429 status codes and my program failing after max retries was reached. Their docs recommend using exponential backoff when dealing with 429. This custom exponential backoff fixed it for me, but I think this could be solved with a better DX inside the SDK.

let responseText = "";
let retryCount = 0;
let delay = 500;
while (true) {
	try {
		let { text } = await generateText({
			model: google("gemini-2.0-flash"),
			prompt,
			maxRetries: 0,
		});
		responseText = text;
		break;
	} catch (e) {
		if (APICallError.isInstance(e) && e.statusCode === 429) {
			retryCount++;
			if (retryCount > 10) {
				throw e;
			}
			await new Promise((resolve) => setTimeout(resolve, delay));
			delay = Math.min(delay * 2, 2000);
		} else {
			throw e;
		}
	}
}

Additional context

I was using AI SDK to generate translations for my internationalized UI. The full JSON is too large to be translated in one shot. So I build chunks of 100 keys to translate and sequentially send the translation prompt to the model.

The full error message


An error occurred: RetryError [AI_RetryError]: Failed after 3 attempts. Last error: Resource has been exhausted (e.g. check quota).
    at _retryWithExponentialBackoff (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/util/retry-with-exponential-backoff.ts:51:13)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:308:32)
    at async  (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
    at async ai (/Users/carlassmann/Projects/daqqi/apps/app/scripts/translate-with-ai.ts:37:32) {
  cause: undefined,
  reason: 'maxRetriesExceeded',
  errors: [
    APICallError [AI_APICallError]: Resource has been exhausted (e.g. check quota).
        at  (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/response-handler.ts:59:16)
        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
        at async postToApi (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/post-to-api.ts:81:28)
        at async GoogleGenerativeAILanguageModel.doGenerate (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/google/src/google-generative-ai-language-model.ts:213:50)
        at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:348:30)
        at async  (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
        at async _retryWithExponentialBackoff (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/util/retry-with-exponential-backoff.ts:36:12)
        at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:308:32)
        at async  (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
        at async ai (/Users/carlassmann/Projects/daqqi/apps/app/scripts/translate-with-ai.ts:37:32) {
      cause: undefined,
      url: 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent',
      requestBodyValues: [Object],
      statusCode: 429,
      responseHeaders: [Object],
      responseBody: '{\n' +
        '  "error": {\n' +
        '    "code": 429,\n' +
        '    "message": "Resource has been exhausted (e.g. check quota).",\n' +
        '    "status": "RESOURCE_EXHAUSTED"\n' +
        '  }\n' +
        '}\n',
      isRetryable: true,
      data: [Object],
      [Symbol(vercel.ai.error)]: true,
      [Symbol(vercel.ai.error.AI_APICallError)]: true
    },
    APICallError [AI_APICallError]: Resource has been exhausted (e.g. check quota).
        at  (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/response-handler.ts:59:16)
        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
        at async postToApi (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/post-to-api.ts:81:28)
        at async GoogleGenerativeAILanguageModel.doGenerate (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/google/src/google-generative-ai-language-model.ts:213:50)
        at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:348:30)
        at async  (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
        at async _retryWithExponentialBackoff (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/util/retry-with-exponential-backoff.ts:36:12)
        at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:308:32)
        at async  (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
        at async ai (/Users/carlassmann/Projects/daqqi/apps/app/scripts/translate-with-ai.ts:37:32) {
      cause: undefined,
      url: 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent',
      requestBodyValues: [Object],
      statusCode: 429,
      responseHeaders: [Object],
      responseBody: '{\n' +
        '  "error": {\n' +
        '    "code": 429,\n' +
        '    "message": "Resource has been exhausted (e.g. check quota).",\n' +
        '    "status": "RESOURCE_EXHAUSTED"\n' +
        '  }\n' +
        '}\n',
      isRetryable: true,
      data: [Object],
      [Symbol(vercel.ai.error)]: true,
      [Symbol(vercel.ai.error.AI_APICallError)]: true
    },
    APICallError [AI_APICallError]: Resource has been exhausted (e.g. check quota).
        at  (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/response-handler.ts:59:16)
        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
        at async postToApi (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/post-to-api.ts:81:28)
        at async GoogleGenerativeAILanguageModel.doGenerate (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/google/src/google-generative-ai-language-model.ts:213:50)
        at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:348:30)
        at async  (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
        at async _retryWithExponentialBackoff (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/util/retry-with-exponential-backoff.ts:36:12)
        at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:308:32)
        at async  (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
        at async ai (/Users/carlassmann/Projects/daqqi/apps/app/scripts/translate-with-ai.ts:37:32) {
      cause: undefined,
      url: 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent',
      requestBodyValues: [Object],
      statusCode: 429,
      responseHeaders: [Object],
      responseBody: '{\n' +
        '  "error": {\n' +
        '    "code": 429,\n' +
        '    "message": "Resource has been exhausted (e.g. check quota).",\n' +
        '    "status": "RESOURCE_EXHAUSTED"\n' +
        '  }\n' +
        '}\n',
      isRetryable: true,
      data: [Object],
      [Symbol(vercel.ai.error)]: true,
      [Symbol(vercel.ai.error.AI_APICallError)]: true
    }
  ],
  lastError: APICallError [AI_APICallError]: Resource has been exhausted (e.g. check quota).
      at  (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/response-handler.ts:59:16)
      at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
      at async postToApi (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/post-to-api.ts:81:28)
      at async GoogleGenerativeAILanguageModel.doGenerate (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/google/src/google-generative-ai-language-model.ts:213:50)
      at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:348:30)
      at async  (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
      at async _retryWithExponentialBackoff (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/util/retry-with-exponential-backoff.ts:36:12)
      at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:308:32)
      at async  (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
      at async ai (/Users/carlassmann/Projects/daqqi/apps/app/scripts/translate-with-ai.ts:37:32) {
    cause: undefined,
    url: 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent',
    requestBodyValues: {
      generationConfig: [Object],
      contents: [Array],
      systemInstruction: undefined,
      safetySettings: undefined,
      tools: undefined,
      toolConfig: undefined,
      cachedContent: undefined
    },
    statusCode: 429,
    responseHeaders: {
      'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000',
      'content-encoding': 'gzip',
      'content-type': 'application/json; charset=UTF-8',
      date: 'Tue, 11 Feb 2025 20:44:20 GMT',
      server: 'scaffolding on HTTPServer2',
      'server-timing': 'gfet4t7; dur=60',
      'transfer-encoding': 'chunked',
      vary: 'Origin, X-Origin, Referer',
      'x-content-type-options': 'nosniff',
      'x-frame-options': 'SAMEORIGIN',
      'x-xss-protection': '0'
    },
    responseBody: '{\n' +
      '  "error": {\n' +
      '    "code": 429,\n' +
      '    "message": "Resource has been exhausted (e.g. check quota).",\n' +
      '    "status": "RESOURCE_EXHAUSTED"\n' +
      '  }\n' +
      '}\n',
    isRetryable: true,
    data: { error: [Object] },
    [Symbol(vercel.ai.error)]: true,
    [Symbol(vercel.ai.error.AI_APICallError)]: true
  },
  [Symbol(vercel.ai.error)]: true,
  [Symbol(vercel.ai.error.AI_RetryError)]: true
}

Feb 11 '25 21:02 ccssmnn

The AI SDK has built-in retries with exponential backoff that do this. You can configure them via maxRetries.

Why is this needed?

Feb 12 '25 07:02 lgrammel

The number of retries is configurable via maxRetries. The docs say nothing about exponential backoff.

Even with maxRetries, the backoff time is not configurable. Error-specific retry is not configurable.

Feb 12 '25 15:02 ccssmnn

Another issue about this https://github.com/vercel/ai/issues/3619

Apr 15 '25 19:04 agamm

Can we simply expose the backoff settings in the API? That should close off this issue as a simple fix. ATM the docs don't clarify if it internally even uses backoff.

Sep 17 '25 02:09 jonah-propcode

we need this just to be able to display to the user that things are being retried - otherwise looks frozen

Nov 08 '25 01:11 thdxr

What would the ideal API look like? I can't prioritize working on it right away, but I have it on my backlog. Would be helpful to find a consensus on the API design

Nov 11 '25 03:11 gr2m

add a defaultDelay to my proposal to support doing other stuff when a retry is being scheduled.

let { text } = await generateText({
  model: yourModel,
  prompt,
  onRetry: async ({ retry, error, defaultDelay }) => {
    console.warn(`Retry #${retry}`, error)

    if (retry === 2) {
      // maybe flush some metrics here
    }

    // keep default exponential backoff behavior
    return defaultDelay
  },
})

onRetry should support sync or async functions.

Nov 11 '25 08:11 ccssmnn

the above works for me - but my needs are minimal i just need to be able to show retries in the UI

Nov 11 '25 17:11 thdxr

I think it might be preferred to return a boolean from the onRetry to indicate if the retry should happen rather than how long to delay. You can just use an async function to control the delay. Also include the retryCount

This is similar to other popular backoff/retry libraries such as https://www.npmjs.com/package/exponential-backoff

Nov 12 '25 04:11 kdawgwilk

The number of retries is configurable via maxRetries. The docs say nothing about exponential backoff.

Even with maxRetries, the backoff time is not configurable. Error-specific retry is not configurable.

For context, this does seem to use exponential backoff under the hood, even if there's no explicit reference to it in the docs. See: ai/packages/ai/src/util/retry-with-exponential-backoff.ts

(FYI @ccssmnn)

Nov 20 '25 10:11 karannavani

ai ai copied to clipboard

Implement Custom Retry Callback for Error-Specific Retries and Exponential Backoff

Feature Description

Use Cases

Additional context

ai
ai copied to clipboard