dify icon indicating copy to clipboard operation
dify copied to clipboard

extractor function of the file from the url link doesn't work

Open sweetkey opened this issue 9 months ago • 8 comments

Self Checks

  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.1.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

extractor of the file from the url link, doesn't work. For instance, given the link of https://github.com/langgenius/dify/blob/main/README.md,, the chatpaper app had the error message.

Image

server.log also showed: 2025-03-19 18:26:12.465 INFO [Dummy-7] [_client.py:1038] - HTTP Request: GET https://github.com/langgenius/dify/blob/main/README.md "HTTP/1.1 200 OK"

✔️ Expected Behavior

Was fine at early version.

❌ Actual Behavior

stopped running, and no clear error message.

sweetkey avatar Mar 18 '25 21:03 sweetkey

I found a similar closed issue titled "Unable to retrieve url from file" which might be relevant to your problem. This issue was resolved by a pull request that fixed the error of being unable to retrieve a URL from a file [1].

Regarding the error messages you're encountering, the ssrf_proxy.py file is responsible for handling URL requests and includes timeout settings that could lead to errors if exceeded. A timeout error, like the one you mentioned in the server.log, could be due to the request taking longer than the specified timeout settings, network issues, or server unavailability [2].

The extractor function processes uploaded files and URL links by validating the file path. If the path is a URL, it attempts to download the file. Common validation errors include invalid URLs or file paths, and unsupported file types [3].

The Pydantic validation error you mentioned could be related to the input data not matching the expected schema, which might be due to an invalid file URL or incorrect data structure [3].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar Mar 18 '25 21:03 dosubot[bot]

Given the url link, https://github.com/langgenius/dify/blob/main/README.md, running the app of chatPaper, there was an error message: Image

Regarding the file load functionality, it was fixed in version 1.1.0.

sweetkey avatar Mar 19 '25 00:03 sweetkey

Image Image

I encountered the same issue, the document extractor fails if use romote url.

llt22 avatar Mar 19 '25 02:03 llt22

@sweetkey Hi there, Your log shows you tried to access https:example.com, which is not a valid URL.

laipz8200 avatar Mar 19 '25 08:03 laipz8200

@sweetkey Hi there, Your log shows you tried to access https:example.com, which is not a valid URL.

SORRY, I tried a few different url links. I should not refer to this log as my problem. It is confused.

See this new log, which download README.md file correctly, I believe, but the error message shows up. 2025-03-19 18:26:12.465 INFO [Dummy-7] [_client.py:1038] - HTTP Request: GET https://github.com/langgenius/dify/blob/main/README.md "HTTP/1.1 200 OK"

Image

sweetkey avatar Mar 19 '25 18:03 sweetkey

Image

my file url:

http://10.54.108.229:7300/images/250213/32041200400220250213135937524419-11i.jpeg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=czmp-client%2F20250320%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250320T053317Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Signature=3eef8454799f9b0643b2e21ecf7ba0bf793664fa6fb09ee81dd48207fa9d9bb3

got error:

1 validation error for File Value error, Invalid file url [type=value error, input_value={'id':None, 'tenant id'....y file' 'url': None}input type=dict] For further information visit
https://errors.pydantic.dev/2.9/v/value error

zicjin avatar Mar 20 '25 06:03 zicjin

Image my file url:  我的文件 URL:
http://10.54.108.229:7300/images/250213/32041200400220250213135937524419-11i.jpeg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=czmp-client%2F20250320%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250320T053317Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Signature=3eef8454799f9b0643b2e21ecf7ba0bf793664fa6fb09ee81dd48207fa9d9bb3

got error:  GOT 错误:

1 validation error for File Value error, Invalid file url [type=value error, input_value={'id':None, 'tenant id'....y file' 'url': None}input type=dict] For further information visit
https://errors.pydantic.dev/2.9/v/value error

Have you solved it

mos-fine avatar Mar 28 '25 12:03 mos-fine

Hi, @sweetkey. I'm Dosu, and I'm helping the Dify team manage their backlog and am marking this issue as stale.

Issue Summary:

  • The extractor function for files from URL links in the Dify application is malfunctioning in self-hosted Docker setups.
  • The issue persists despite working in earlier versions, affecting users like you and llt22.
  • I suggested potential causes like timeout settings or Pydantic validation errors, but you confirmed the error occurs even with valid URLs.
  • Other users, including zicjin, have reported similar validation errors, indicating a broader issue with URL handling.

Next Steps:

  • Please confirm if this issue is still relevant to the latest version of the Dify repository by commenting here.
  • If no updates are provided, the issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

dosubot[bot] avatar Apr 28 '25 16:04 dosubot[bot]