dify icon indicating copy to clipboard operation
dify copied to clipboard

Agent node (with vision-enabled LLM) not recognizing image

Open lema-founders opened this issue 1 month ago • 3 comments

Self Checks

  • [x] I have read the Contributing Guide and Language Policy.
  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report, otherwise it will be closed.
  • [x] 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.9.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

  1. Create chatflow with file list input
Image
  1. Upload file to be sent to the chatflow { "files": [ { "dify_model_identity": "dify__file", "id": null, "tenant_id": "e5bab243-a5df-8b13-99cb-ac0d7689963b", "type": "image", "transfer_method": "local_file", "remote_url": "https://dify.prod.com/files/9ce2316a-f7fe-4f29-becd-13c47faaff98/file-preview?timestamp=1762961440&nonce=44985ee079fbf23abff426989354aa62&sign=JPJbSl1ajsmbQuVFLatZpputGfPkR6w9vPP8ZbM2xt4%3D", "related_id": "9ce2316a-f7fe-4f29-becd-13c47faaff98", "filename": "png-transparent-logo-contracting-photography-logo-symbol.png", "extension": ".png", "mime_type": "image/png", "size": 9354, "url": "https://dify.prod.com/files/9ce2316a-f7fe-4f29-becd-13c47faaff98/file-preview?timestamp=1762961446&nonce=e46f9e126b262db6635bfd06b7c149d6&sign=NikHDv0DKq_A4iwcVDdOxeI203SNEkkVLHBmj-mXaGs%3D" } ], "ip_location": null, "sys.files": [], "sys.user_id": "7a8458f7-dc81-6564-90a4-5c9ca4d0c76b", "sys.app_id": "db0e13ff-ec94-4b87-a645-0afceefe164c", "sys.workflow_id": "81816333-f478-4554-a074-f9a2da250390", "sys.workflow_run_id": "48789beb-242f-45ea-1298-23675452b5c9", "sys.query": "what's this image", "sys.dialogue_count": 1 }

  2. Add files in agentic node (GPT-5)

Image

✔️ Expected Behavior

Agent being able to recognize the image. In this case, answering the sys.query "what's this image" and responds properly

❌ Actual Behavior

Agent not recognizing the image despite using vision-enabled model for the agent

Image

lema-founders avatar Nov 12 '25 15:11 lema-founders

The most common reason a vision-enabled agent node doesn't recognize an uploaded image in Dify is that the image isn't being passed to the model in the expected way. For vision models like GPT-5, the image must be referenced correctly in the agent's input payload. If you upload images via workflow steps or configuration, the agent may not receive them—images should be uploaded using the chat UI's upload button near the send message field for the vision feature to work as intended. If the agent responds with "no image or link is attached," it's likely the image variable (files or sys.files) is empty or not mapped correctly in the workflow context, so the model never sees the image input source.

Check that the Vision feature is enabled in your workflow and that your model selection is still a vision-capable model. For self-hosted Docker, make sure the INTERNAL_FILES_URL environment variable is set—otherwise, image uploads may silently fail source. Also, configuration changes only apply to new conversations, so start a new chat after updating settings.

If you're passing files as an array, but the agent expects a single file, use a list filter node or ensure only one image is uploaded. For best results, reference sys.files in the agent node's context and avoid adding it directly to the user prompt—let the vision setting handle it source.

If you follow these steps and the agent still doesn't recognize the image, there may be a bug or a model/provider limitation. Let me know if you need help tracing the variable mapping or want to check your workflow configuration in detail.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Chat with Dosu Join Discord Share on X

dosubot[bot] avatar Nov 12 '25 15:11 dosubot[bot]

@dosu how to specifically enable vision for the LLM used in agentic node?

lema-founders avatar Nov 12 '25 15:11 lema-founders

To enable vision for the LLM in an agentic node, make sure you (1) select a vision-capable model (like GPT-4o or GPT-5) in the agent node's model selector, and (2) toggle on the Vision feature in the node's configuration. The Vision toggle is usually found in the agent node settings—when enabled, it adds "image" to the allowed file types and lets the model process image inputs. If you don't see the toggle, the model may not support vision or the feature could be hidden by config. After enabling, ensure your workflow passes the image file (not just a reference) to the agent node, and start a new conversation to apply changes.

You can see how this works in the code for the vision config UI, which manages the toggle and allowed file types for image support (source). The model selector UI also filters and displays models with vision capability (source).

If the agent still doesn't recognize images, double-check that image uploads are enabled in your chatflow and that the image is actually being passed to the node (not just referenced in a variable).

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Chat with Dosu Join Discord Share on X

dosubot[bot] avatar Nov 12 '25 15:11 dosubot[bot]