pugixml icon indicating copy to clipboard operation
pugixml copied to clipboard

Empty text childs increase XML file size

Open Mai1er opened this issue 2 years ago • 5 comments
trafficstars

// (pseudo code) xml::node Node = ParentNode.append_child("NewChildNode"); Node.text().set(""); // << create new text-type subchild with no text

// after save document to file we have: <NewChildNode></NewChildNode> // but must be (by default output format) <NewChildNode /> // "virtual" empty text-child lock short form of record

Mai1er avatar Jun 12 '23 02:06 Mai1er

Sure... why is this a problem? It's not clear to me if changing this is worth the benefit, as the application can just not create empty text children.

zeux avatar Jun 16 '23 04:06 zeux

I have text fields assigned by a template, without checking for empty/filled. There are many fields in the file and often most of them are empty. This increases the size of the files by 20-30%, and there are... a lot of files on the disk.

Besides, why leave a mistake when you can do better?

Mai1er avatar Jun 16 '23 06:06 Mai1er

Ok, and it's impractical for your application to check if the field is empty before assigning? What if the field value is purely white space?

Additionally, if your data has a lot of empty fields and the output size matters, do you need to add the nodes corresponding to empty fields at all? <NewChildNode /> still takes space and you might be able to omit the node entirely.

Overall I'm not entirely sure where the "mistake" is here. Maybe set() shouldn't even create a PCDATA node, or maybe the library works fine; it's an odd corner case.

zeux avatar Jun 16 '23 09:06 zeux

  1. THATS not app side of responsibility
  2. empty fields need by format

Mai1er avatar Jun 16 '23 14:06 Mai1er

This behavior is confusing because:

  1. you create a XML file with empty nodes -> after save with default flags then empty nodes are expanded.
  2. you load that file and save it again with default flags -> empty nodes got compressed.

I found out after comparing files with diff tool. It is impractical to compare those files using a diff tool.

So in case one wants to have expanded empty nodes then one should use existing format flag format_no_empty_element_tags. Omitting that flag one would always expect compressed empty nodes.

chr-thien avatar Feb 23 '24 11:02 chr-thien