promptfoo icon indicating copy to clipboard operation
promptfoo copied to clipboard

Image prompt with Gemini and GPT-4o-mini

Open dschulzdev opened this issue 8 months ago • 1 comments

Describe the bug I want to make evaluations with an image and a user prompt on Gemini 2.0 Flash and GPT-4o-mini. While 4o-mini image input works, Gemini gives me a bad request. Am i missing something or does image prompting with Gemini not work? To Reproduce This code produces the error in evaluation :

import promptfoo from "promptfoo";

const config = {
  description: "Test",
  prompts: [
    () => [
      {
        role: "user",
        content: [
          {
            type: "text",
            text: `What is in this image?`,
          },
          {
            type: "image_url",
            image_url: {
              url: `https://upload.wikimedia.org/wikipedia/commons/thumb/5/55/Foto_de_Jose_Eloy_Mart%C3%ADnez_Sim%C3%B3_%28edt.%29.jpg/1280px-Foto_de_Jose_Eloy_Mart%C3%ADnez_Sim%C3%B3_%28edt.%29.jpg`,
            },
          },
        ],
      },
    ],
  ],
  providers: [
    {
      id: "google:gemini-2.0-flash",
    },
  ],
  tests: [
    {
      assert: [
        {
          type: "not-is-json",
        },
      ],
    },
  ],
  writeLatestResults: true,
};

async function run() {
  await promptfoo.evaluate(config, {});
}
run();

Expected behavior Gemini should use the image as an input for my prompt Screenshots Image

  • Promptfoo version: 0.109.0

dschulzdev avatar Apr 08 '25 21:04 dschulzdev

@domephant Hey I don't think promptfoo would parse the prompts from openai format to gemini specific in this case, but you can make it work by specify the correct request format like:

const fs = require("fs");
const path = require("path");
const promptfoo = require("/Users/guangshuozang/Dev-Promptfoo/promptfoo");

const base64Image =
  "iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII";
const mimeType = "image/png";

const config = {
  description: "Test Gemini 2.0 Flash with inline image",
  prompts: [
    () => [
      {
        role: "user",
        parts: [
          {
            text: "What is in this image?",
          },
          {
            inline_data: {
              mime_type: mimeType,
              data: base64Image,
            },
          },
        ],
      },
    ],
  ],
  providers: [
    {
      id: "google:gemini-2.0-flash",
    },
  ],
  tests: [
    {
      assert: [
        {
          type: "not-is-json",
        },
      ],
    },
  ],
  writeLatestResults: true,
};

async function run() {
  const result = await promptfoo.evaluate(config, {});
  console.log(JSON.stringify(result, null, 2));
}

run();

MrFlounder avatar Apr 09 '25 00:04 MrFlounder

Hi @domephant,

Thanks for this question. The issue is that OpenAI and Gemini use different formats for image inputs - promptfoo doesn't automatically convert between them.

For evaluating both models, use provider-specific prompts:

openai-prompt.json:

[
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "{{question}}"
      },
      {
        "type": "image_url",
        "image_url": {
          "url": "{{imageUrl}}"
        }
      }
    ]
  }
]

gemini-prompt.json:

[
  {
    "role": "user",
    "parts": [
      {
        "text": "{{question}}"
      },
      {
        "inline_data": {
          "mime_type": "image/jpeg",
          "data": "{{imageData}}"
        }
      }
    ]
  }
]

promptfooconfig.yaml:

prompts:
  - id: file://openai-prompt.json
    label: openai_vision
  - id: file://gemini-prompt.json
    label: gemini_vision

providers:
  - id: openai:gpt-4o-mini
    prompts: [openai_vision]
  - id: google:gemini-2.0-flash
    prompts: [gemini_vision]

tests:
  - vars:
      question: What do you see?
      imageUrl: https://your-image-url.jpg
      imageData: file://path/to/image.jpg  # file:// auto-converts to base64

See Prompts documentation for more on provider-specific prompts.

If you're still having issues, feel free to comment or re-open.

Thanks!

mldangelo avatar Oct 20 '25 06:10 mldangelo