Skyscraper
Skyscraper copied to clipboard
Fail to parse html
Comparing different libraries for parsing HTML and found that Skyscraper fails in some cases when other (sxd_html or one on Swift) works fine.
let link = "https://livejournal.com/";
let response = reqwest::blocking::get(link).expect("load url error");
let html_text = response.text().expect("get html text");
let document = skyscraper::html::parse(&html_text).expect("parse html");
returns parse html: EndTagMismatch { end_name: "svg", open_name: "symbol" }
I'm currently working on a rewrite of the HTML module. It will follow the official HTML standard as defined by https://html.spec.whatwg.org/multipage/parsing.html. Hopefully that will solve your issues.
It's a lot of work though so I don't really have an ETA - depends how much free time I get.