html5ever icon indicating copy to clipboard operation
html5ever copied to clipboard

Implement full XML serialization for nodes

Open jdm opened this issue 6 years ago • 6 comments

The implementation in https://github.com/servo/html5ever/blob/master/xml5ever/src/serialize/mod.rs is fairly simplistic, and does not reflect the complexity described by https://w3c.github.io/DOM-Parsing/#dfn-xml-serialization.

jdm avatar Mar 18 '19 20:03 jdm

Ok, I'm down with doing this.

My biggest questions are. Should this also add fragment parsing (#271)? Also while digging through the code I found #122 that might be tangentially related.

Right now I'm implementing parts of XML parsing. I'll probably leave fragment parsing for another time and possibly see about fixing #122 at an even later date.

Ygg01 avatar Mar 30 '19 12:03 Ygg01

I believe #122 is part of the full serialization algorithm, yes

jdm avatar Mar 30 '19 13:03 jdm

https://github.com/servo/servo/issues/24920 looks like it's because of a step in this, "Elements not in the HTML namespace containing no children, are serialized using the empty-element tag syntax (i.e., according to the XML EmptyElemTag production). "

pshaughn avatar Dec 10 '19 17:12 pshaughn

@jdm yeah. I'm back on this issue. After a long time, I finally have time off. However I notice few problems.

  1. There is no type Document or DocumentFragment.
  2. To fulfil the problematic parts. Namely to get skip end tag in step 14. I need to know if the node has children or not, which I can't get from serializer unless some things are changed.

What would be a preferred solution?

For 1) can see adding extra Node variants, like Node::Document/Node:Document_Fragment For 2) I assume that I either need to create another method in Serializer akin to start_elem e.g. write_elem(&mut self, name: QualName, attrs: AttrIter, leaf_node: bool).

Ygg01 avatar Jan 01 '20 04:01 Ygg01

Yeah, if we need to be able to represent more kinds of nodes then we should add those variants. As for question 2, modifying the Serializer trait to provide the information you require sounds reasonable. If we can pass an argument to start_elem instead of adding a new separate method that's only used by the XML serializer, that might be preferable.

jdm avatar Jan 03 '20 00:01 jdm

This subject was progressed or we can work on it, to be a good starting point? If not, can you suggest another good first issue @jdm, thank you by now.

ktfth avatar Nov 06 '22 00:11 ktfth