Skyscraper
Skyscraper copied to clipboard
Rust HTML Scraping with XPath Expressions
Comparing different libraries for parsing HTML and found that Skyscraper fails in some cases when other (sxd_html or one on Swift) works fine. ```rust let link = "https://livejournal.com/"; let response...
``` [dependencies] serde = { version = "1.0", features = ["derive"] } serde_json = "1.0" skyscraper = "0.7.0-beta.1" reqwest = { version = "0.12.4", features = ["default", "blocking", "cookies", "json",...
Can you add unescape for `' ` and `\n\n` to space Bitcoin`'`s => Bitcoin's example text https://www.binance.com/en/square/post/2024-07-07-cryptocurrencies-rally-amid-unexpected-u-s-unemployment-rate-rise-10483746406866 `According to U.Today, cryptocurrencies including Bitcoin (BTC), Dogecoin (DOGE), XRP, and Cardano (ADA)...
There are some `unwrap` and `expect` calls not only in tests but also in real code. It prevents from using the lib in apps with high crash-free standards.
"This code of html is very old, but it still exists and is causing a new error on this site with non-standard tags. Can you make a correction? Error: thread...
I have some experience with parsing using xpath and I was very disappointed that there isn’t a proper crate for parsing websites in Rust. Previously, I used what is practically...
**Work in progress...** A huge rewrite of the html module that follows the [HTML standard](https://html.spec.whatwg.org/multipage/parsing.html). The goal of this rewrite is to bring the html parsing closer to how browsers...
This adds the clone Trait to Xpath so it can be used with the cached macro from the cached crate. This PR is against 0.6.4 because the master branch currently...
The `unescape_characters` method is slow, so the places it is called have been reduced. Ideally it's performance would be improved, but the html module is in the process of being...