ai
ai copied to clipboard
Implement Custom Retry Callback for Error-Specific Retries and Exponential Backoff
Feature Description
I propose a parameter that allows configuring retry behavior via a callback function. This function should receive the error and retry count and returns the number of milliseconds to wait until the next retry. Instead of returning milliseconds, the user could throw to cancel retries.
Example:
let { text } = await generateText({
model: yourModel,
prompt,
onRetry: ({ retry, error }) => {
if (retry === 5) throw error; // maximum 5 retries
if (!APICallError.isInstance(e) || e.statusCode !== 429) throw error; // only retry specific errors
return Math.min(100 * 2 ** retry, 2000); // exponential backoff
},
});
Use Cases
I've had Google returning 429 status codes and my program failing after max retries was reached. Their docs recommend using exponential backoff when dealing with 429. This custom exponential backoff fixed it for me, but I think this could be solved with a better DX inside the SDK.
let responseText = "";
let retryCount = 0;
let delay = 500;
while (true) {
try {
let { text } = await generateText({
model: google("gemini-2.0-flash"),
prompt,
maxRetries: 0,
});
responseText = text;
break;
} catch (e) {
if (APICallError.isInstance(e) && e.statusCode === 429) {
retryCount++;
if (retryCount > 10) {
throw e;
}
await new Promise((resolve) => setTimeout(resolve, delay));
delay = Math.min(delay * 2, 2000);
} else {
throw e;
}
}
}
Additional context
I was using AI SDK to generate translations for my internationalized UI. The full JSON is too large to be translated in one shot. So I build chunks of 100 keys to translate and sequentially send the translation prompt to the model.
The full error message
An error occurred: RetryError [AI_RetryError]: Failed after 3 attempts. Last error: Resource has been exhausted (e.g. check quota).
at _retryWithExponentialBackoff (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/util/retry-with-exponential-backoff.ts:51:13)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:308:32)
at async (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
at async ai (/Users/carlassmann/Projects/daqqi/apps/app/scripts/translate-with-ai.ts:37:32) {
cause: undefined,
reason: 'maxRetriesExceeded',
errors: [
APICallError [AI_APICallError]: Resource has been exhausted (e.g. check quota).
at (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/response-handler.ts:59:16)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async postToApi (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/post-to-api.ts:81:28)
at async GoogleGenerativeAILanguageModel.doGenerate (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/google/src/google-generative-ai-language-model.ts:213:50)
at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:348:30)
at async (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
at async _retryWithExponentialBackoff (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/util/retry-with-exponential-backoff.ts:36:12)
at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:308:32)
at async (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
at async ai (/Users/carlassmann/Projects/daqqi/apps/app/scripts/translate-with-ai.ts:37:32) {
cause: undefined,
url: 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent',
requestBodyValues: [Object],
statusCode: 429,
responseHeaders: [Object],
responseBody: '{\n' +
' "error": {\n' +
' "code": 429,\n' +
' "message": "Resource has been exhausted (e.g. check quota).",\n' +
' "status": "RESOURCE_EXHAUSTED"\n' +
' }\n' +
'}\n',
isRetryable: true,
data: [Object],
[Symbol(vercel.ai.error)]: true,
[Symbol(vercel.ai.error.AI_APICallError)]: true
},
APICallError [AI_APICallError]: Resource has been exhausted (e.g. check quota).
at (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/response-handler.ts:59:16)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async postToApi (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/post-to-api.ts:81:28)
at async GoogleGenerativeAILanguageModel.doGenerate (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/google/src/google-generative-ai-language-model.ts:213:50)
at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:348:30)
at async (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
at async _retryWithExponentialBackoff (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/util/retry-with-exponential-backoff.ts:36:12)
at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:308:32)
at async (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
at async ai (/Users/carlassmann/Projects/daqqi/apps/app/scripts/translate-with-ai.ts:37:32) {
cause: undefined,
url: 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent',
requestBodyValues: [Object],
statusCode: 429,
responseHeaders: [Object],
responseBody: '{\n' +
' "error": {\n' +
' "code": 429,\n' +
' "message": "Resource has been exhausted (e.g. check quota).",\n' +
' "status": "RESOURCE_EXHAUSTED"\n' +
' }\n' +
'}\n',
isRetryable: true,
data: [Object],
[Symbol(vercel.ai.error)]: true,
[Symbol(vercel.ai.error.AI_APICallError)]: true
},
APICallError [AI_APICallError]: Resource has been exhausted (e.g. check quota).
at (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/response-handler.ts:59:16)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async postToApi (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/post-to-api.ts:81:28)
at async GoogleGenerativeAILanguageModel.doGenerate (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/google/src/google-generative-ai-language-model.ts:213:50)
at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:348:30)
at async (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
at async _retryWithExponentialBackoff (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/util/retry-with-exponential-backoff.ts:36:12)
at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:308:32)
at async (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
at async ai (/Users/carlassmann/Projects/daqqi/apps/app/scripts/translate-with-ai.ts:37:32) {
cause: undefined,
url: 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent',
requestBodyValues: [Object],
statusCode: 429,
responseHeaders: [Object],
responseBody: '{\n' +
' "error": {\n' +
' "code": 429,\n' +
' "message": "Resource has been exhausted (e.g. check quota).",\n' +
' "status": "RESOURCE_EXHAUSTED"\n' +
' }\n' +
'}\n',
isRetryable: true,
data: [Object],
[Symbol(vercel.ai.error)]: true,
[Symbol(vercel.ai.error.AI_APICallError)]: true
}
],
lastError: APICallError [AI_APICallError]: Resource has been exhausted (e.g. check quota).
at (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/response-handler.ts:59:16)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async postToApi (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/provider-utils/src/post-to-api.ts:81:28)
at async GoogleGenerativeAILanguageModel.doGenerate (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/@[email protected][email protected]/node_modules/@ai-sdk/google/src/google-generative-ai-language-model.ts:213:50)
at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:348:30)
at async (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
at async _retryWithExponentialBackoff (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/util/retry-with-exponential-backoff.ts:36:12)
at async fn (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/generate-text/generate-text.ts:308:32)
at async (/Users/carlassmann/Projects/daqqi/node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/ai/core/telemetry/record-span.ts:18:22)
at async ai (/Users/carlassmann/Projects/daqqi/apps/app/scripts/translate-with-ai.ts:37:32) {
cause: undefined,
url: 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent',
requestBodyValues: {
generationConfig: [Object],
contents: [Array],
systemInstruction: undefined,
safetySettings: undefined,
tools: undefined,
toolConfig: undefined,
cachedContent: undefined
},
statusCode: 429,
responseHeaders: {
'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000',
'content-encoding': 'gzip',
'content-type': 'application/json; charset=UTF-8',
date: 'Tue, 11 Feb 2025 20:44:20 GMT',
server: 'scaffolding on HTTPServer2',
'server-timing': 'gfet4t7; dur=60',
'transfer-encoding': 'chunked',
vary: 'Origin, X-Origin, Referer',
'x-content-type-options': 'nosniff',
'x-frame-options': 'SAMEORIGIN',
'x-xss-protection': '0'
},
responseBody: '{\n' +
' "error": {\n' +
' "code": 429,\n' +
' "message": "Resource has been exhausted (e.g. check quota).",\n' +
' "status": "RESOURCE_EXHAUSTED"\n' +
' }\n' +
'}\n',
isRetryable: true,
data: { error: [Object] },
[Symbol(vercel.ai.error)]: true,
[Symbol(vercel.ai.error.AI_APICallError)]: true
},
[Symbol(vercel.ai.error)]: true,
[Symbol(vercel.ai.error.AI_RetryError)]: true
}
The AI SDK has built-in retries with exponential backoff that do this. You can configure them via maxRetries.
Why is this needed?
The number of retries is configurable via maxRetries. The docs say nothing about exponential backoff.
Even with maxRetries, the backoff time is not configurable. Error-specific retry is not configurable.
Another issue about this https://github.com/vercel/ai/issues/3619
Can we simply expose the backoff settings in the API? That should close off this issue as a simple fix. ATM the docs don't clarify if it internally even uses backoff.
we need this just to be able to display to the user that things are being retried - otherwise looks frozen
What would the ideal API look like? I can't prioritize working on it right away, but I have it on my backlog. Would be helpful to find a consensus on the API design
add a defaultDelay to my proposal to support doing other stuff when a retry is being scheduled.
let { text } = await generateText({
model: yourModel,
prompt,
onRetry: async ({ retry, error, defaultDelay }) => {
console.warn(`Retry #${retry}`, error)
if (retry === 2) {
// maybe flush some metrics here
}
// keep default exponential backoff behavior
return defaultDelay
},
})
onRetry should support sync or async functions.
the above works for me - but my needs are minimal i just need to be able to show retries in the UI
I think it might be preferred to return a boolean from the onRetry to indicate if the retry should happen rather than how long to delay. You can just use an async function to control the delay. Also include the retryCount
This is similar to other popular backoff/retry libraries such as https://www.npmjs.com/package/exponential-backoff
The number of retries is configurable via
maxRetries. The docs say nothing about exponential backoff.Even with
maxRetries, the backoff time is not configurable. Error-specific retry is not configurable.
For context, this does seem to use exponential backoff under the hood, even if there's no explicit reference to it in the docs. See: ai/packages/ai/src/util/retry-with-exponential-backoff.ts
(FYI @ccssmnn)