html5ever icon indicating copy to clipboard operation
html5ever copied to clipboard

xmlns-prefixed attribute on root element is lost

Open jdm opened this issue 10 months ago • 12 comments

<!-- before -->
<svg xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
    <sodipodi:namedview>
        ...
    </sodipodi:namedview>

    <path d="..." sodipodi:nodetypes="cccc"/>
</svg>

<!-- after: xmlns:sodipodi moved to child -->
<svg xmlns="http://www.w3.org/2000/svg">
    <sodipodi:namedview xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
        ...
    </sodipodi:namedview>

    <path d="..." sodipodi:nodetypes="cccc"></path>
</svg>
<!-- before -->
<svg xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
    <path d="..." sodipodi:nodetypes="cccc"/>
</svg>

<!-- after: xmlns:sodipodi removed entirely -->
<svg xmlns="http://www.w3.org/2000/svg">
    <path d="..." sodipodi:nodetypes="cccc"></path>
</svg>

Originally posted by @noahbald in https://github.com/servo/html5ever/issues/538#issuecomment-2506784772

jdm avatar Jan 12 '25 03:01 jdm

It looks like the parsing code stores the namespace data for the node in the treebuilder's namespace stack, but the data about which tag declares a namespace is thrown away. Instead we store the namespace uri for child tags; this is why the xmlns-prefixed attribute jumps to the first child that uses the namespace.

jdm avatar Jan 12 '25 05:01 jdm

Parsing code that strips out the xmlns attribute: https://github.com/servo/html5ever/blob/feced062a9119d6f8279cf9d002f9a93926a8ca4/xml5ever/src/tree_builder/mod.rs#L347 Parsing code that stores the namespace URI on child tags: https://github.com/servo/html5ever/blob/feced062a9119d6f8279cf9d002f9a93926a8ca4/xml5ever/src/tree_builder/mod.rs#L297 Serialization code that uses the namespace URI: https://github.com/servo/html5ever/blob/feced062a9119d6f8279cf9d002f9a93926a8ca4/xml5ever/src/serialize/mod.rs#L111

jdm avatar Jan 12 '25 05:01 jdm

Testcase in rcdom/tests/xml-driver.rs:

#[test]
fn weird() {
    assert_serialization(
       r##"<svg xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
    <sodipodi:namedview>
        ...
    </sodipodi:namedview>

    <path d="..." sodipodi:nodetypes="cccc"/>
</svg>"##,
        driver::parse_document(RcDom::default(), Default::default())
            .from_utf8()
            .one(r##"<svg xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
    <sodipodi:namedview>
        ...
    </sodipodi:namedview>

    <path d="..." sodipodi:nodetypes="cccc"/>
</svg>"##.as_bytes()),
    );
}

jdm avatar Jan 12 '25 05:01 jdm

Current status: I have changes that allow these testcases to pass. However, they cause a lot of xml5lib namespace tests to fail, and https://github.com/Ygg01/xml5lib-tests/commit/3fb8d2b6bc4295b1280cbca7a87ad3fb5cae93f3 suggests that the current behaviour is intentional. I'm hoping to get in touch with @Ygg01 to better understand what the right behaviour here is.

jdm avatar Jan 12 '25 06:01 jdm

Branch: https://github.com/servo/html5ever/compare/xmlns-attrs?expand=1

jdm avatar Jan 12 '25 06:01 jdm

Current status: I have changes that allow these testcases to pass. However, they cause a lot of xml5lib namespace tests to fail, and Ygg01/xml5lib-tests@3fb8d2b suggests that the current behaviour is intentional. I'm hoping to get in touch with @Ygg01 to better understand what the right behaviour here is.

Hi Josh. To be honest I don't quite remember the motivation. It's been like 8 years.

The important question is does it goes against XML namespace spec? If not, just change the tests. I'm open to PRs on xml5lib-tests

Ygg01 avatar Jan 12 '25 12:01 Ygg01

@jdm what are the next steps to close this?

  • Do you want me to fix the xml5lib-tests?
  • Move xml5lib-tests to the Servo repository, so you can take control over it?
  • Make PRs to Ygg01/xm5lib-tests for me to review?

Ygg01 avatar Jan 22 '25 13:01 Ygg01

Hey if it helps I've found a few documents which will become invalid/broken after parsing and serializing and then viewing in Firefox. Perhaps these can help?

Arch Linux Logo Blobs Isometric Madness tldr-pages Banner Wikipedia Logo

noahbald avatar Feb 15 '25 06:02 noahbald

@jdm I've updated the xml5lib tests. New commit. Might require submodule update.

Looking back I think I'm starting to remember why I did it like that. And why I was wrong.

  1. When converting XML ⇾ tree, the prefixes aren't kept. In fact, they are completely arbitrary. They are like temporary variables that are discarded after parsing, e.g. svg:width = "10" becomes {http://www.w3.org/2000/svg}width = "10" in tree.
  2. When converting namespaces, the namespace definitions could be used by simplistic libs, to match on prefix, by scanning for namespace definitions, which is something that should never be done. You search by namespaces, not prefix. I omitted them to prevent such use cases. Trying to be correct by design.
  3. I didn't consider someone would serialize and deserialize and expect similar/same results.

Ygg01 avatar Feb 16 '25 12:02 Ygg01

I opened #586, which is @jdm's branch plus a submodule update for xml5lib-tests. Looks like everything is passing.

devongovett avatar Mar 23 '25 21:03 devongovett

@Ygg01 In the new xml5lib test commit. I notice that the tags been renamed from <a> to <aN> where N is some number. For some of the tests the corresponding </a> has also been renamed to </aN> but for some of them the </a> is still just </a>. Was that intentional / does it matter? I just thought I'd raise the inconsistency as it looks like it could be a potential bug.

nicoburns avatar Mar 26 '25 04:03 nicoburns

I notice that the tags been renamed from <a> to <aN>

@nicoburns Yes. It made it easier to find which of the tests failed. I had to debug most of them and getting error to show which test failed helped, what could happen is that I missed some end tag.

EDIT: Updated tags to and created a new commit https://github.com/Ygg01/xml5lib-tests/commit/a6d626edfc2e79246d26185b7f7ac23945a307f9

Ygg01 avatar Mar 26 '25 14:03 Ygg01

Reverted. We'll need to investigate test failures in Servo with #586 applied before it can merge again (https://github.com/servo/servo/pull/36284#issuecomment-2784279593).

jdm avatar Apr 08 '25 13:04 jdm

Reverted. We'll need to investigate test failures in Servo with #586 applied before it can merge again (servo/servo#36284 (comment)).

To copy my reply from there, it seems now nodes were serializing the namespaces when they didn't need to😕 Maybe a default setting mismatch.

BCompare_WyFmG3VoqV

Also builds passed on Servo. I assume that's done by removing it from the merge queue.

Ygg01 avatar Apr 08 '25 20:04 Ygg01

Yeah, I published a new version of xml5ever that reverted the problem change.

jdm avatar Apr 08 '25 21:04 jdm

I'm not too familiar with the servo setup, why would we see namespaces serialized in these tests?

Ygg01 avatar Apr 09 '25 07:04 Ygg01

Is this issue still open? Didn't changes to markup5ever fix it?

Ygg01 avatar Jul 19 '25 13:07 Ygg01