haystack-core-integrations icon indicating copy to clipboard operation
haystack-core-integrations copied to clipboard

Add support for Reader API to convert HTMLs into Documents

Open bilgeyucel opened this issue 10 months ago • 1 comments

Is your feature request related to a problem? Please describe. There's no component to use Jina's Reader API with Haystack.

Describe the solution you'd like A new JinaHTMLtoDocument (name TBD) component to use Jina's Reader API to convert URLs into Haystack Documents. This component should accept a URL and output a Haystack Document.

Describe alternatives you've considered

  • This component can output a markdown file and users might use MarkdownConverter to use that component in a pipeline (not Haystack intuitive but might have advantages)
  • Depending on how the Reader API works, it can accept a list of URLs and return a list of Haystack Documents

Additional context Add any other context or screenshots about the feature request here.

bilgeyucel avatar Apr 16 '24 09:04 bilgeyucel