ai icon indicating copy to clipboard operation
ai copied to clipboard

The AI SDK fails to download a file from an R2 Bucket for Gemini only

Open lmyslinski opened this issue 4 months ago • 8 comments

Using AI SDK 4.3.19.

When I run the app locally, everything works with both providers that I'm using (Gemini and OpenAI). Once I build the app into a Docker image and deploy it on a VPS, the AI SDK silently fails to download the file from R2 presigned URL and fails the request. Then I have a fallback configured which sends identical parameters to OpenAI... and that one works.

Here's a sample URL (gonna be expired): https://cee54bbea28917f2bdf4a30dd0672f18.r2.cloudflarestorage.com/dev-cv2b-storage/cv_files/d55d92d4-c94c-4d50-ad98-fc238e4c8260/ed36b351-7b12-48b9-b8fa-ab26dbe7d587/Andrew_Converter_CV.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=719d564f633ec36c84e17bfc34486047%2F20250804%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250804T185033Z&X-Amz-Expires=600&X-Amz-Signature=3a725dd8f64c83514984413e77ba2f9fe4a1290f5dfb997698337b5de93a0b10&X-Amz-SignedHeaders=host&x-amz-checksum-mode=ENABLED&x-id=GetObject

I've verified that the presigned URL's are accessible from both my machine, as well as the VPS on which the app is deployed:

Image

Here's my somewhat messy call logic with a wrapper -

export function prepareUserInput(presignedUrl: string): CoreUserMessage {
  const url = new URL(presignedUrl);
    return {
      role: "user" as const,
      content: [
        {
          type: "file" as const,
          data: url,
          mimeType: "application/pdf",
        },
      ],
    };
  }
}

export async function extractBasicData(
  trace: LangfuseTraceClient,
  userMessage: CoreUserMessage,
  language: string,
  modelType: ModelType,
): Promise<BasicData> {
  try {
    return await callWithModelFallback<BasicData>(
      modelType,
      "object",
      {
        schema: basicDataSchema,
        messages: [
          {
            role: "system",
            content: `...`,
          },
          userMessage,
        ],
      },
      trace,
      "extractBasicData",
    );
  } catch (error) {
    logger.error({ error, msg: "Failed to extract basic data from file" });
    throw new Error("Failed to extract basic data from file");
  }
}

export async function callWithModelFallback<T>(
  modelType: ModelType,
  callType: "object" | "text",
  params: any,
  trace: LangfuseTraceClient,
  generationName: string,
): Promise<T> {
  try {
...
      const result = await generateObject({
        ...params,
        model: primaryModel,
        experimental_telemetry: {
          isEnabled: true,
          functionId: generationName,
          metadata: {
            langfuseTraceId: trace.id,
            langfuseUpdateParent: false, // Do not update the parent trace with execution results
          },
        },
      });
      return result.object as T;
   ...
  } catch (primaryError) {
...
        const result = await generateObject({
          ...params,
          model: fallbackModel,
          experimental_telemetry: {
            isEnabled: true,
            functionId: generationName,
            metadata: {
              langfuseTraceId: trace.id,
              langfuseUpdateParent: false, // Do not update the parent trace with execution results
            },
          },
        });
        return result.object as T;
...
    } catch (fallbackError) {
      logger.error(
        {
          modelType,
          callType,
          model: fallbackModel.modelId,
          err: fallbackError,
        },
        "Fallback model also failed",
      );
      throw fallbackError;
    }
  }
}

Gemini error:

{
  "err": {
   ...
    "data": {
      "error": {
        "code": 400,
        "message": "The document has no pages.",
        "status": "INVALID_ARGUMENT"
      }
    },
    "responseHeaders": {
      "alt-svc": "h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000",
      "content-encoding": "gzip",
      "content-type": "application/json; charset=UTF-8",
      "date": "Mon, 04 Aug 2025 18:56:22 GMT",
      "server": "scaffolding on HTTPServer2",
      "server-timing": "gfet4t7; dur=247",
      "transfer-encoding": "chunked",
      "vary": "Origin, X-Origin, Referer",
      "x-content-type-options": "nosniff",
      "x-frame-options": "SAMEORIGIN",
      "x-xss-protection": "0"
    },
    "isRetryable": false,
    "message": "The document has no pages.",
    "name": "AI_APICallError",
    "responseBody": "{\n  \"error\": {\n    \"code\": 400,\n    \"message\": \"The document has no pages.\",\n    \"status\": \"INVALID_ARGUMENT\"\n  }\n}\n",
    "stack": "AI_APICallError: The document has no pages.\n    at new e (/app/.next/server/chunks/node_modules_c673a5b3._.js:1:3298)\n    at new ei (/app/.next/server/chunks/node_modules_c673a5b3._.js:1:3783)\n    at <anonymous> (/app/.next/server/chunks/node_modules_c673a5b3._.js:3:5746)\n    at processTicksAndRejections (native)",
    "statusCode": 400,
    "type": "ei",
    "url": "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent"
  },`

I've concluded that this must be an issue on AI SDK's side, as per documentation the file should be downloaded and embedded in the request, however it's clearly not the case. Super weird that it's only for Gemini and not reproducible in localhost.

lmyslinski avatar Aug 04 '25 19:08 lmyslinski

just bumped into this, did you ever find a solution @lmyslinski ?

kcc999 avatar Oct 13 '25 09:10 kcc999

@kcc999 Sadly not, I've migrated into downloading the file onto the server and embedding it in the request myself

lmyslinski avatar Oct 13 '25 09:10 lmyslinski

nice, just about to use this, following

grmkris avatar Oct 20 '25 19:10 grmkris

It would be nice to include an option to ignore when it fails to download a file (or maybe with error callback) instead of failing the whole request.

abegehr avatar Oct 21 '25 15:10 abegehr

Any updates on this?

rockingrohit9639 avatar Oct 27 '25 10:10 rockingrohit9639

yeahh looks like ive run into the same issue this is bad, because other frameworks like mastra are building on the ai sdk and if this isnt fixed the errors will propagate

Jraykekreate avatar Nov 06 '25 00:11 Jraykekreate

i got the same issue. from this thread, someone says: https://discuss.ai.google.dev/t/pdf-page-error-in-uploading/50627/14

The PDF files I was using had been uploaded with gzip compression. After I tested uploading them without compression, everything worked fine.

but how do i control user uploaded file?

songkeys avatar Nov 12 '25 16:11 songkeys

@kcc999 I've just upgraded to AI SDK v5, I'll try this approach once more and let you know if it's still in place and if I manage to find a workaround

lmyslinski avatar Nov 14 '25 16:11 lmyslinski

The issue still persists - locally works fine, when deployed onto a VPS AI SDK is unable to download the file when using Gemini models (OpenAI works):

Tue, 25 Nov 2025 14:39:21 UTC   {"environment":"staging","hostname":"de46150e58ab","level":"error","msg":"Failed to download https://cee54bbea28917f2bdf4a30dd0672f18.r2.cloudflarestorage.com/.../7001a611-dbc2-4364-801a-1b497bd038f8/30d2b609-f1ae-4aaa-8cae-ca9b0192b6e6/...?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Content-Sha256=UNSIGNED-PAYLOAD\u0026X-Amz-Credential=719d564f633ec36c84e17bfc34486047%2F20251125%2Fus-east-1%2Fs3%2Faws4_request\u0026X-Amz-Date=20251125T143917Z\u0026X-Amz-Expires=600\u0026X-Amz-Signature=e548dbc2c4e56a231434ec112426d0a323aa9cf5f1d132ef252e9f4cf628d977\u0026X-Amz-SignedHeaders=host\u0026x-amz-checksum-mode=ENABLED\u0026x-id=GetObject: TypeError: ReadableStream is locked","name":"ai-providers","pid":1,"serviceUrl":"..."}

lmyslinski avatar Nov 25 '25 14:11 lmyslinski