Skyscraper
Skyscraper copied to clipboard
refactor(html)!: html module rewrite
Work in progress...
A huge rewrite of the html module that follows the HTML standard.
The goal of this rewrite is to bring the html parsing closer to how browsers behave by fixing some common html errors that are served by websites. Since users of Skyscraper likely do not have control over the html served by a website, it is important to have some of this standardized error handling.
A secondary goal of this rewrite is to better integrate the html and xpath modules of Skyscraper which had diverged after the xpath module had been rewritten, and started requiring an HtmlDocument to be converted to an XpathItemTree before being searched.