dify icon indicating copy to clipboard operation
dify copied to clipboard

Firecrawl and Jina Reader APIs don't work

Open conflock opened this issue 11 months ago • 3 comments

Self Checks

  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

0.15.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Both APIs for crawling websites, Firecrawl and Jina Reader don't work. Firecrawl throws an error " Failed to start crawl job. Status code: 400. Error: Bad Request" and Jina Reader throws a Firecrawl Error (!!!). At the moment it is not possible to use websites as data source.

✔️ Expected Behavior

Crawl websites with external crawling APIs

❌ Actual Behavior

Both APIs throw errors

conflock avatar Jan 27 '25 16:01 conflock

Hello @flockton! I'm here to help you with any bugs, questions, or contributions you have while waiting for a human maintainer. Let's work together to solve this issue.

I found a similar unsolved discussion regarding the issue with the Firecrawl and Jina Reader APIs, which might be relevant to your situation. You can check it out here [1].

Regarding the errors you're encountering, there are a few potential causes to consider:

  1. Missing or Invalid Parameters: Ensure that all required parameters like url, options, and limit are correctly provided in your requests. Missing or invalid parameters can lead to a "Bad Request" error.

  2. Invalid Provider: Verify that the provider specified is either "firecrawl" or "jinareader". Any other value will result in an error.

  3. API Key Issues: Check that the API key is correctly configured and decrypted. An incorrect API key can cause authentication errors.

  4. Incorrect API Endpoint or Parameters: Make sure that the endpoints and parameters used in your requests are correct. Any mismatch can result in a 400 error.

These are some areas to investigate to resolve the issue. If you need further assistance, please provide more details about your configuration and the exact requests being made.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

dosubot[bot] avatar Jan 27 '25 16:01 dosubot[bot]

the same here, it occurs when create dataset in /datasets/create, selecting "from website"

btbxbob avatar Feb 09 '25 02:02 btbxbob

Hi, @conflock. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.

Issue Summary:

  • Reported issues with Firecrawl and Jina Reader APIs in Dify version 0.15.2 when self-hosted via Docker.
  • Encountering a "Bad Request" error with status code 400.
  • Suggested checks include verifying parameters, provider, API key configuration, and endpoint accuracy.
  • @btbxbob confirmed the same issue when creating a dataset from a website, indicating a broader problem.

Next Steps:

  • Please confirm if this issue is still relevant with the latest version of Dify. If so, you can keep the discussion open by commenting here.
  • If there is no further activity, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

dosubot[bot] avatar Mar 15 '25 16:03 dosubot[bot]