Skyscraper issues

Fail to parse html

1

Comparing different libraries for parsing HTML and found that Skyscraper fails in some cases when other (sxd_html or one on Swift) works fine. ```rust let link = "https://livejournal.com/"; let response...

NightBlaze

Retrieving tag attributes with xpath in 0.7.0-beta2

3

``` [dependencies] serde = { version = "1.0", features = ["derive"] } serde_json = "1.0" skyscraper = "0.7.0-beta.1" reqwest = { version = "0.12.4", features = ["default", "blocking", "cookies", "json",...

RustGrow

Unescape some new characters

1

Can you add unescape for `&#x27; ` and `\n\n` to space Bitcoin`&#x27;`s => Bitcoin's example text https://www.binance.com/en/square/post/2024-07-07-cryptocurrencies-rally-amid-unexpected-u-s-unemployment-rate-rise-10483746406866 `According to U.Today, cryptocurrencies including Bitcoin (BTC), Dogecoin (DOGE), XRP, and Cardano (ADA)...

RustGrow

`unwrap` and `expect` in the code

1

There are some `unwrap` and `expect` calls not only in tests but also in real code. It prevents from using the lib in apps with high crash-free standards.

NightBlaze

Fix non standard tags

5

"This code of html is very old, but it still exists and is causing a new error on this site with non-standard tags. Can you make a correction? Error: thread...

RustGrow

Simple api

4

I have some experience with parsing using xpath and I was very disappointed that there isn’t a proper crate for parsing websites in Rust. Previously, I used what is practically...

RustGrow

refactor(html)!: html module rewrite

**Work in progress...** A huge rewrite of the html module that follows the [HTML standard](https://html.spec.whatwg.org/multipage/parsing.html). The goal of this rewrite is to bring the html parsing closer to how browsers...

James-LG

add clone to Xpath

1

This adds the clone Trait to Xpath so it can be used with the cached macro from the cached crate. This PR is against 0.6.4 because the master branch currently...

claudenobs

fix(html): Reduce calls to unescape_characters which is slow

The `unescape_characters` method is slow, so the places it is called have been reduced. Ideally it's performance would be improved, but the html module is in the process of being...

James-LG

Skyscraper
Skyscraper copied to clipboard

Metadata

Fail to parse html

Retrieving tag attributes with xpath in 0.7.0-beta2

Unescape some new characters

`unwrap` and `expect` in the code

Fix non standard tags

Simple api

refactor(html)!: html module rewrite

add clone to Xpath

fix(html): Reduce calls to unescape_characters which is slow

← Metadata

Owner

Metadata

Skyscraper Skyscraper copied to clipboard

Metadata

← Metadata

Owner

Metadata

Skyscraper
Skyscraper copied to clipboard