scraper icon indicating copy to clipboard operation
scraper copied to clipboard

Unable to get html element dc:language

Open barun511 opened this issue 1 month ago • 3 comments

As title says, apparently we don't support the pseudoclass language, but as all I want to do is get the XML element corresponding to it, I'm forced to do a really ugly hack that replaces dc:language with something, finds the thing I want, and then unreplaces it.

        let lang_selector = Selector::parse("dc:language").unwrap();

called `Result::unwrap()` on an `Err` value: UnexpectedSelectorParseError(UnsupportedPseudoClassOrElement("language"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

barun511 avatar Nov 15 '25 00:11 barun511

I don't think language is a pseudo-class here. If you are processing XML, dc is probably a namespace prefix and you will want to have a look at how CSS handles XML namespaces, e.g. https://developer.mozilla.org/en-US/docs/Web/CSS/Guides/Namespaces

adamreichold avatar Nov 15 '25 06:11 adamreichold

Sorry I don't know much about pseudoclasses, I guess the only point I was making was that I have an xml element like

<dc:language>en</dc:language>

and I wanted to parse it

barun511 avatar Nov 15 '25 17:11 barun511

You have to figure out which namespace is referenced via dc and construct a corresponding selector involving the namespace separator. (If you do not care about the namespace, just selecting language might also work.)

adamreichold avatar Nov 15 '25 18:11 adamreichold

Sorry I don't know much about pseudoclasses, I guess the only point I was making was that I have an xml element like

<dc:language>en</dc:language>

and I wanted to parse it

By the way, I was able to successfully select a fragment with a namespaced tag by using backslashes in the selector (just like they were doing it in jQuery https://stackoverflow.com/a/11502677):

let app_version_selector = Selector::parse("im\\:version").unwrap();

kosyak avatar Dec 07 '25 10:12 kosyak