html5ever
html5ever copied to clipboard
xmlns-prefixed attribute on root element is lost
<!-- before -->
<svg xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
<sodipodi:namedview>
...
</sodipodi:namedview>
<path d="..." sodipodi:nodetypes="cccc"/>
</svg>
<!-- after: xmlns:sodipodi moved to child -->
<svg xmlns="http://www.w3.org/2000/svg">
<sodipodi:namedview xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
...
</sodipodi:namedview>
<path d="..." sodipodi:nodetypes="cccc"></path>
</svg>
<!-- before -->
<svg xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
<path d="..." sodipodi:nodetypes="cccc"/>
</svg>
<!-- after: xmlns:sodipodi removed entirely -->
<svg xmlns="http://www.w3.org/2000/svg">
<path d="..." sodipodi:nodetypes="cccc"></path>
</svg>
Originally posted by @noahbald in https://github.com/servo/html5ever/issues/538#issuecomment-2506784772
It looks like the parsing code stores the namespace data for the node in the treebuilder's namespace stack, but the data about which tag declares a namespace is thrown away. Instead we store the namespace uri for child tags; this is why the xmlns-prefixed attribute jumps to the first child that uses the namespace.
Parsing code that strips out the xmlns attribute: https://github.com/servo/html5ever/blob/feced062a9119d6f8279cf9d002f9a93926a8ca4/xml5ever/src/tree_builder/mod.rs#L347 Parsing code that stores the namespace URI on child tags: https://github.com/servo/html5ever/blob/feced062a9119d6f8279cf9d002f9a93926a8ca4/xml5ever/src/tree_builder/mod.rs#L297 Serialization code that uses the namespace URI: https://github.com/servo/html5ever/blob/feced062a9119d6f8279cf9d002f9a93926a8ca4/xml5ever/src/serialize/mod.rs#L111
Testcase in rcdom/tests/xml-driver.rs:
#[test]
fn weird() {
assert_serialization(
r##"<svg xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
<sodipodi:namedview>
...
</sodipodi:namedview>
<path d="..." sodipodi:nodetypes="cccc"/>
</svg>"##,
driver::parse_document(RcDom::default(), Default::default())
.from_utf8()
.one(r##"<svg xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
<sodipodi:namedview>
...
</sodipodi:namedview>
<path d="..." sodipodi:nodetypes="cccc"/>
</svg>"##.as_bytes()),
);
}
Current status: I have changes that allow these testcases to pass. However, they cause a lot of xml5lib namespace tests to fail, and https://github.com/Ygg01/xml5lib-tests/commit/3fb8d2b6bc4295b1280cbca7a87ad3fb5cae93f3 suggests that the current behaviour is intentional. I'm hoping to get in touch with @Ygg01 to better understand what the right behaviour here is.
Branch: https://github.com/servo/html5ever/compare/xmlns-attrs?expand=1
Current status: I have changes that allow these testcases to pass. However, they cause a lot of xml5lib namespace tests to fail, and Ygg01/xml5lib-tests@3fb8d2b suggests that the current behaviour is intentional. I'm hoping to get in touch with @Ygg01 to better understand what the right behaviour here is.
Hi Josh. To be honest I don't quite remember the motivation. It's been like 8 years.
The important question is does it goes against XML namespace spec? If not, just change the tests. I'm open to PRs on xml5lib-tests
@jdm what are the next steps to close this?
- Do you want me to fix the xml5lib-tests?
- Move xml5lib-tests to the Servo repository, so you can take control over it?
- Make PRs to Ygg01/xm5lib-tests for me to review?
Hey if it helps I've found a few documents which will become invalid/broken after parsing and serializing and then viewing in Firefox. Perhaps these can help?
Arch Linux Logo Blobs Isometric Madness tldr-pages Banner Wikipedia Logo
@jdm I've updated the xml5lib tests. New commit. Might require submodule update.
Looking back I think I'm starting to remember why I did it like that. And why I was wrong.
- When converting XML ⇾ tree, the prefixes aren't kept. In fact, they are completely arbitrary. They are like temporary variables that are discarded after parsing, e.g.
svg:width = "10"becomes{http://www.w3.org/2000/svg}width = "10"in tree. - When converting namespaces, the namespace definitions could be used by simplistic libs, to match on prefix, by scanning for namespace definitions, which is something that should never be done. You search by namespaces, not prefix. I omitted them to prevent such use cases. Trying to be correct by design.
- I didn't consider someone would serialize and deserialize and expect similar/same results.
I opened #586, which is @jdm's branch plus a submodule update for xml5lib-tests. Looks like everything is passing.
@Ygg01 In the new xml5lib test commit. I notice that the tags been renamed from <a> to <aN> where N is some number. For some of the tests the corresponding </a> has also been renamed to </aN> but for some of them the </a> is still just </a>. Was that intentional / does it matter? I just thought I'd raise the inconsistency as it looks like it could be a potential bug.
I notice that the tags been renamed from
<a>to<aN>
@nicoburns Yes. It made it easier to find which of the tests failed. I had to debug most of them and getting error to show which test failed helped, what could happen is that I missed some end tag.
EDIT: Updated tags to and created a new commit https://github.com/Ygg01/xml5lib-tests/commit/a6d626edfc2e79246d26185b7f7ac23945a307f9
Reverted. We'll need to investigate test failures in Servo with #586 applied before it can merge again (https://github.com/servo/servo/pull/36284#issuecomment-2784279593).
Reverted. We'll need to investigate test failures in Servo with #586 applied before it can merge again (servo/servo#36284 (comment)).
To copy my reply from there, it seems now nodes were serializing the namespaces when they didn't need to😕 Maybe a default setting mismatch.
Also builds passed on Servo. I assume that's done by removing it from the merge queue.
Yeah, I published a new version of xml5ever that reverted the problem change.
I'm not too familiar with the servo setup, why would we see namespaces serialized in these tests?
Is this issue still open? Didn't changes to markup5ever fix it?