James comments

Results 9 comments of


                                            James

Fail to parse html

I'm currently working on a rewrite of the HTML module. It will follow the official HTML standard as defined by https://html.spec.whatwg.org/multipage/parsing.html. Hopefully that will solve your issues. It's a lot...

Retrieving tag attributes with xpath in 0.7.0-beta2

You can get attributes using an xpath expression (`/@href`) as documented in the [xpath module](https://docs.rs/skyscraper/0.7.0-beta.2/skyscraper/xpath/index.html#example-get-links-with-the-href-xpath-step).

Simple api

> I have some experience with parsing using xpath and I was very disappointed that there isn’t a proper crate for parsing websites in Rust. I'll start by saying I...

Simple api

v0.7.0-beta.1 has addressed some of these feature requests. See #42 for details.

Fix non standard tags

Skyscraper generally doesn't care what your tags are called, that error means there is something like `hi` somewhere in the raw HTML. The opening tag `` has to be matched...

Fix non standard tags

Can you provide a snippet of the html causing your errors? I don't like clicking links to websites I've never heard of.

Unescape some new characters

1. `&` is already unescaped to `&` using [unescape_characters](https://docs.rs/skyscraper/latest/skyscraper/html/fn.unescape_characters.html). 2. `#x27;` is not a valid escape sequence as far as I can tell. 3. `\n\n` is two new lines, not...

`unwrap` and `expect` in the code

Yes the readme does say > This library is major-version 0 because there are still todo! calls for many xpath features. If you encounter one that you feel should be...

add clone to Xpath

The html module has lots of problems since I hacked it together years ago. I've been working on rewriting it for the past few months which I published to #51...