langflow icon indicating copy to clipboard operation
langflow copied to clipboard

Component Fails to Send Image Data to Multimodal Models (e.g., gpt-4o-mini)

Open kcpan-glitch opened this issue 4 months ago • 3 comments

Bug Description

When using a multimodal model like gpt-4o-mini within a component (e.g., the standard Agent or OpenAI component), the component fails to correctly pass image data to the model's API. The model then returns a text-only error message stating it cannot view images, even though the model itself is vision-capable. This indicates the component is likely using a text-only API format instead of the required multimodal format.

Reproduction

  1. Create a flow using a component that calls an OpenAI model (like the Agent component).

  2. In the component's settings, select gpt-4o-mini as the model.

  3. Use a Chat Input to send a text prompt (e.g., "what is in this image?") along with an uploaded image file.

  4. Execute the flow.

  5. The model returns an error: "I can't view or interpret images directly."

Expected behavior

The component should correctly format the multimodal input (image + text) and send it to the gpt-4o-mini API, resulting in the model providing a description of the image.

Who can help?

No response

Operating System

macOS

Langflow Version

1.5.13

Python Version

None

Screenshot

No response

Flow File

No response

kcpan-glitch avatar Aug 21 '25 07:08 kcpan-glitch

Same issue here, but on 1.6.5..

onghuisheng avatar Oct 23 '25 07:10 onghuisheng

I’m having the same issue in v1.6.5 where, when using Model Provider = OpenAI, any model name specified in the Agent is unable to recognize images. It always responds with the same message: “I can't view or interpret images directly.” However, the language model still works fine.

Previously in v1.2.0, both the agent and the language model were able to recognize images.

byzxc avatar Oct 23 '25 07:10 byzxc

I have tested multiple scenarios and identified the following results:

  1. Component "Language Model": OpenAI, Anthropic, and Gemini can all read uploaded images thru run flow API calls.
  2. Component "Agent": Anthropic and Gemini can read uploaded images; however, OpenAI CANNOT read uploaded images thru run flow API calls.
  3. Component "Agent" has a bug: if LANGFLOW_CONFIG_DIR is not explicitly set in the environment or configuration file, it defaults to using "/" (or "C:" on Windows) as the base path to access uploaded files.
  4. Component "ChatInput" has a bug: when should_store_message=true, any API calls containing the "files" attribute will fail with an "index out of bounds" error.

104bertchen avatar Nov 14 '25 13:11 104bertchen

The problem has been fixed in the main OS for macOS and Linux.

For Windows, we still have an issue regarding the different way the Path is handled.

Ongoing PR with final corrections: https://github.com/langflow-ai/langflow/pull/10941

Empreiteiro avatar Dec 17 '25 14:12 Empreiteiro