Component Fails to Send Image Data to Multimodal Models (e.g., gpt-4o-mini)
Bug Description
When using a multimodal model like gpt-4o-mini within a component (e.g., the standard Agent or OpenAI component), the component fails to correctly pass image data to the model's API. The model then returns a text-only error message stating it cannot view images, even though the model itself is vision-capable. This indicates the component is likely using a text-only API format instead of the required multimodal format.
Reproduction
-
Create a flow using a component that calls an OpenAI model (like the Agent component).
-
In the component's settings, select gpt-4o-mini as the model.
-
Use a Chat Input to send a text prompt (e.g., "what is in this image?") along with an uploaded image file.
-
Execute the flow.
-
The model returns an error: "I can't view or interpret images directly."
Expected behavior
The component should correctly format the multimodal input (image + text) and send it to the gpt-4o-mini API, resulting in the model providing a description of the image.
Who can help?
No response
Operating System
macOS
Langflow Version
1.5.13
Python Version
None
Screenshot
No response
Flow File
No response
Same issue here, but on 1.6.5..
I’m having the same issue in v1.6.5 where, when using Model Provider = OpenAI, any model name specified in the Agent is unable to recognize images. It always responds with the same message: “I can't view or interpret images directly.” However, the language model still works fine.
Previously in v1.2.0, both the agent and the language model were able to recognize images.
I have tested multiple scenarios and identified the following results:
- Component "Language Model": OpenAI, Anthropic, and Gemini can all read uploaded images thru run flow API calls.
- Component "Agent": Anthropic and Gemini can read uploaded images; however, OpenAI CANNOT read uploaded images thru run flow API calls.
- Component "Agent" has a bug: if LANGFLOW_CONFIG_DIR is not explicitly set in the environment or configuration file, it defaults to using "/" (or "C:" on Windows) as the base path to access uploaded files.
- Component "ChatInput" has a bug: when should_store_message=true, any API calls containing the "files" attribute will fail with an "index out of bounds" error.
The problem has been fixed in the main OS for macOS and Linux.
For Windows, we still have an issue regarding the different way the Path is handled.
Ongoing PR with final corrections: https://github.com/langflow-ai/langflow/pull/10941