haystack-core-integrations
haystack-core-integrations copied to clipboard
Add support for Reader API to convert HTMLs into Documents
Is your feature request related to a problem? Please describe. There's no component to use Jina's Reader API with Haystack.
Describe the solution you'd like A new JinaHTMLtoDocument (name TBD) component to use Jina's Reader API to convert URLs into Haystack Documents. This component should accept a URL and output a Haystack Document.
Describe alternatives you've considered
- This component can output a markdown file and users might use MarkdownConverter to use that component in a pipeline (not Haystack intuitive but might have advantages)
- Depending on how the Reader API works, it can accept a list of URLs and return a list of Haystack Documents
Additional context Add any other context or screenshots about the feature request here.