ragna icon indicating copy to clipboard operation
ragna copied to clipboard

[ENH] - Add support for parsing HTML

Open nenb opened this issue 1 year ago • 0 comments

Feature description

HTML tags encode their own kind of information about a document, which can help context generation. In some cases it may be preferable to use the HTML sources than other available sources for context generation (eg ar5iv-HTML vs arXiv-PDF, as PDF is challenging to parse accurately).

We should add support for parsing HTML documents out of the box.

Value and/or benefit

  • It opens up a wide-range of documents via the web.
  • It offers the possibility of more accurate context generation in certain situations.
  • It means that context can be easily generated from real-time sources (eg news websites) or social media (eg hackernews)

Anything else?

No response

nenb avatar Dec 17 '23 18:12 nenb