extruct icon indicating copy to clipboard operation
extruct copied to clipboard

Matching Order in LxmlMicrodataExtractor._extract_property_value

Open kelvinso opened this issue 5 years ago • 1 comments

I noticed that the matching order of _extract_property_value seems to be inconsistent with https://www.w3.org/TR/microdata/#values. In this doc, it mentions that the 2nd matching case is "If the element has a content attribute". However, in LxmlMicrodataExtractor._extract_property_value, it is 2nd to the last in the matching order.

Should this case

 elif node.get("content"):
            return node.get("content")

in w3cmicrodata.py be moved before resolving for meta tag at line 186?

Thanks a lot! Kelvin

kelvinso avatar Nov 20 '20 04:11 kelvinso

Yeah, it looks like the changes they’ve made to the specification since 2013 (that code is from 2014) include allowing content on any node, which back in 2013 was non-standard yet supported by extruct.

We should probably review the standard changes in general, there may be more surprises.

Gallaecio avatar Feb 21 '21 16:02 Gallaecio