ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: Web crawling and contend extraction

Open raikloe opened this issue 10 months ago • 5 comments

Self Checks

  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (Language Policy).
  • [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • [x] Please do not modify this template :) and fill in all the required fields.

Describe your problem

About a year ago, a feature request was opened that presented web crawling and its extracted content (#315). However, this was closed without comment, although it was briefly included in the 2024 roadmap. Is there a reason for this? Is it foreseeable that the feature will be developed in the future?

I would be pleased if it is well accepted

raikloe avatar Mar 13 '25 09:03 raikloe

Hiya, v0.17.0 introduced Tavily-based web search in chat assistant and v0.17.2, the latest released integrated this feature into the Retrieval agent component.

Does this suffice?

writinwaters avatar Mar 13 '25 10:03 writinwaters

@writinwaters Thanks for the hint! I will check it if it helps 👍 Am I right in assuming that the actual feature request is outsourced to an external provider?

raikloe avatar Mar 13 '25 13:03 raikloe

Hmm,,, yes and no. We integrated Tavily, but there will be more coming in the pipeline. It is more about not reinventing the wheel.

writinwaters avatar Mar 14 '25 02:03 writinwaters

We have internal confluence pages needs scraping, any chance to integrate atlassian products?

hlx98007 avatar Mar 17 '25 07:03 hlx98007

Hi there @hlx98007 This is not on our roadmap. Still, you can file a feature issue for it.

writinwaters avatar Mar 17 '25 08:03 writinwaters