ragna
ragna copied to clipboard
[ENH] - Add support for parsing HTML
Feature description
HTML tags encode their own kind of information about a document, which can help context generation. In some cases it may be preferable to use the HTML sources than other available sources for context generation (eg ar5iv-HTML vs arXiv-PDF, as PDF is challenging to parse accurately).
We should add support for parsing HTML documents out of the box.
Value and/or benefit
- It opens up a wide-range of documents via the web.
- It offers the possibility of more accurate context generation in certain situations.
- It means that context can be easily generated from real-time sources (eg news websites) or social media (eg hackernews)
Anything else?
No response