scraper icon indicating copy to clipboard operation
scraper copied to clipboard

Select inside <noscript>

Open pitdicker opened this issue 2 years ago • 9 comments

Elements within a <noscript> tag seem to be ignored by default. Is there a workaround?

Example

use scraper::{Html, Selector};

fn main() {
    let fragment = Html::parse_fragment("<noscript><h1>Hello, world!</h1></noscript>");
    let selector = Selector::parse("h1").unwrap();

    let h1 = fragment.select(&selector).next().unwrap();

    assert_eq!("<h1>Hello, world!</h1>", h1.html());
}

current output:

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/main.rs:7:48
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

pitdicker avatar Mar 19 '23 18:03 pitdicker

The parsed result is:

Html {
    errors: [],
    quirks_mode: NoQuirks,
    tree: Tree { Fragment => { Element(<html>) => { Element(<noscript>) => { Text(Tendril<UTF8>(owned: "<h1>Hello, world!</h1>")) } } } },
}

It seems that the content in the script will be treated as Text, even though it is legal html

oovm avatar Apr 18 '23 12:04 oovm

Might this be an issue in how we configure html5ever?

teymour-aldridge avatar Apr 18 '23 13:04 teymour-aldridge

https://docs.rs/html5ever/0.26.0/html5ever/tree_builder/struct.TreeBuilderOpts.html#structfield.scripting_enabled, probably. The parse_* functions would need to take an extra argument though.

nathaniel-daniel avatar Sep 27 '23 02:09 nathaniel-daniel