genkit icon indicating copy to clipboard operation
genkit copied to clipboard

[JS] Pass files correctly to other models e.g OpenAI models

Open peter-olom opened this issue 3 months ago • 2 comments

Describe the bug GenKit does not pass on PDF correctly to OpenAI. I use GenKit because I want a common interface for talking to various models however, it appears only Gemini Models support passing pdf correctly to the model. When the request is made, a input like so is generated as seen in GenKit UI:

{
  "model": "openai/gpt-5-nano",
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "text": "# System Instructions: ..."
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "media": {
            "url": "data:application/pdf;base64,JVBERi0..."
          }
        },
        {
          "text": "Extract structured information from this document. File: doc.pdf"
        }
      ]
    }
  ],
  "config": {},
  "output": {
    "schema": {
      "_def": {
        "unknownKeys": "strip",
        "catchall": {
          "_def": {
            "typeName": "ZodNever"
          },
          "~standard": {
            "version": 1,
            "vendor": "zod"
          }
        },
        "typeName": "ZodObject"
      },
      "~standard": {
        "version": 1,
        "vendor": "zod"
      },
      "_cached": null
    },
    "format": "json",
    "jsonSchema": {
      "type": "object",
      "properties": {
        "expected_response_1": {
          "type": "number"
        },
        "expected_response_2": {
          "type": "string",
          "default": ""
        }
      },
      "additionalProperties": true,
      "$schema": "http://json-schema.org/draft-07/schema#"
    }
  }
}

It always fails and only passes when it's a Gemini model. Whereas, OpenAI supports feeding urls and base64 docs as seen here.

import OpenAI from "openai";
const client = new OpenAI();

const response = await client.responses.create({
    model: "gpt-5",
    input: [
        {
            role: "user",
            content: [
                {
                    type: "input_text",
                    text: "Analyze the letter and provide a summary of the key points.",
                },
                {
                    type: "input_file",
                    file_url: "https://www.berkshirehathaway.com/letters/2024ltr.pdf",
                },
            ],
        },
    ],
});

console.log(response.output_text);

To Reproduce Try to submit a content.media.url value to OpenAI models. It'll fail.

Expected behavior That the abstractions on top of Open AI would have converted the input to the right format for OpenAI. Reference document here.

Screenshots Image

Runtime (please complete the following information):

  • OS: MacOS (15.5)
  • GenKit Version: 1.19.2

** Node version

  • 22.11.0

Additional context Better documentation for passing files (non-media e.g. pdf) to models will help.

peter-olom avatar Sep 15 '25 08:09 peter-olom

Hey, I've been digging into this. The core problem is that the compat-oai plugin unconditionally formats any media part as an image_url, which is why it fails for PDFs.

For anyone looking for a temporary fix, you can patch the toOpenAITextAndMedia function. Just be aware that the plugin seems to use the older chat/completions API, so this fix will only work for base64-encoded data: URLs. Regular file URLs won't work until the plugin is updated to use OpenAI's newer /v1/responses API.

Here's the change that worked for me:

// In functions/node_modules/@genkit-ai/compat-oai/lib/model.js
// inside toOpenAITextAndMedia(part, visualDetailLevel)

if (part.media && part.media.contentType === "application/pdf" && part.media.url.startsWith("data:")) {
  return {
    type: "file",
    file: {
      filename: part.media.filename || "input.pdf",
      // The API expects the full data URL here, prefix and all
      file_data: part.media.url
    }
  };
}
// The original image handling logic stays the same
return {
  type: "image_url",
  image_url: { url: part.media.url, detail: visualDetailLevel }
};

You can use patch-package to apply this.

It would be great if the Genkit team could fix this in the plugin itself. Properly distinguishing between images and other file types like PDFs would make the abstraction much more reliable. Thanks!

reiinii1 avatar Oct 11 '25 20:10 reiinii1

Same issue here. Hope this can be fixed.

farshid-campfire avatar Nov 18 '25 13:11 farshid-campfire