extractor function of the file from the url link doesn't work
Self Checks
- [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [x] Please do not modify this template :) and fill in all the required fields.
Dify version
1.1.0
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
extractor of the file from the url link, doesn't work. For instance, given the link of https://github.com/langgenius/dify/blob/main/README.md,, the chatpaper app had the error message.
server.log also showed: 2025-03-19 18:26:12.465 INFO [Dummy-7] [_client.py:1038] - HTTP Request: GET https://github.com/langgenius/dify/blob/main/README.md "HTTP/1.1 200 OK"
✔️ Expected Behavior
Was fine at early version.
❌ Actual Behavior
stopped running, and no clear error message.
I found a similar closed issue titled "Unable to retrieve url from file" which might be relevant to your problem. This issue was resolved by a pull request that fixed the error of being unable to retrieve a URL from a file [1].
Regarding the error messages you're encountering, the ssrf_proxy.py file is responsible for handling URL requests and includes timeout settings that could lead to errors if exceeded. A timeout error, like the one you mentioned in the server.log, could be due to the request taking longer than the specified timeout settings, network issues, or server unavailability [2].
The extractor function processes uploaded files and URL links by validating the file path. If the path is a URL, it attempts to download the file. Common validation errors include invalid URLs or file paths, and unsupported file types [3].
The Pydantic validation error you mentioned could be related to the input data not matching the expected schema, which might be due to an invalid file URL or incorrect data structure [3].
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Given the url link, https://github.com/langgenius/dify/blob/main/README.md, running the app of chatPaper, there was an error message:
Regarding the file load functionality, it was fixed in version 1.1.0.
I encountered the same issue, the document extractor fails if use romote url.
@sweetkey Hi there,
Your log shows you tried to access https:example.com, which is not a valid URL.
@sweetkey Hi there, Your log shows you tried to access
https:example.com, which is not a valid URL.
SORRY, I tried a few different url links. I should not refer to this log as my problem. It is confused.
See this new log, which download README.md file correctly, I believe, but the error message shows up. 2025-03-19 18:26:12.465 INFO [Dummy-7] [_client.py:1038] - HTTP Request: GET https://github.com/langgenius/dify/blob/main/README.md "HTTP/1.1 200 OK"
my file url:
http://10.54.108.229:7300/images/250213/32041200400220250213135937524419-11i.jpeg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=czmp-client%2F20250320%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250320T053317Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Signature=3eef8454799f9b0643b2e21ecf7ba0bf793664fa6fb09ee81dd48207fa9d9bb3
got error:
1 validation error for File Value error, Invalid file url [type=value error, input_value={'id':None, 'tenant id'....y file' 'url': None}input type=dict] For further information visit
https://errors.pydantic.dev/2.9/v/value error
my file url: 我的文件 URL:
http://10.54.108.229:7300/images/250213/32041200400220250213135937524419-11i.jpeg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=czmp-client%2F20250320%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250320T053317Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Signature=3eef8454799f9b0643b2e21ecf7ba0bf793664fa6fb09ee81dd48207fa9d9bb3got error: GOT 错误:
1 validation error for File Value error, Invalid file url [type=value error, input_value={'id':None, 'tenant id'....y file' 'url': None}input type=dict] For further information visit https://errors.pydantic.dev/2.9/v/value error
Have you solved it
Hi, @sweetkey. I'm Dosu, and I'm helping the Dify team manage their backlog and am marking this issue as stale.
Issue Summary:
- The extractor function for files from URL links in the Dify application is malfunctioning in self-hosted Docker setups.
- The issue persists despite working in earlier versions, affecting users like you and llt22.
- I suggested potential causes like timeout settings or Pydantic validation errors, but you confirmed the error occurs even with valid URLs.
- Other users, including zicjin, have reported similar validation errors, indicating a broader issue with URL handling.
Next Steps:
- Please confirm if this issue is still relevant to the latest version of the Dify repository by commenting here.
- If no updates are provided, the issue will be automatically closed in 15 days.
Thank you for your understanding and contribution!
my file url: 我的文件 URL: