quickxml_to_serde icon indicating copy to clipboard operation
quickxml_to_serde copied to clipboard

Elements are ignored if a text node is present at the same level

Open rimutaka opened this issue 5 years ago • 2 comments

This issue describes an uncommon scenario where text and elements are mixed together at the same level. I have encountered it in the wild, but not in the context of XML to JSON conversion.

Example

Consider the following well-formed XML example:

<Root>
Some text is totally valid here
  <TaxRate>7.25</TaxRate>
  <Data>
  and also at this level
    <Category>A</Category>
    <Quantity>3</Quantity>
    <Price>24.50</Price>
  </Data>
</Root>

It has 2 instances of text and element nodes at the same level. The expected JSON would be:

{
  "Root": {
    "Data": {
      "Category": "A",
      "Price": 24.5,
      "Quantity": 3,
      "txt": "and also at this level"
    },
    "TaxRate": 7.25,
    "txt": "Some text is totally valid here"
  }
}

but because of the logic in the code where we check for the presence of the text node ( if el.text().trim() != "" { ...) and only handle child elements in the else to that the JSON looses the elements:

{
  "Root": "Some text is totally valid here"
}

Solution

The solution would be to refactor fn convert_node in lib.rs to process the children recursively regardless of the presence of the text node.

This is a low priority issue. No action is expected unless we actually have someone affected by it.

rimutaka avatar Oct 26 '20 03:10 rimutaka

I am unsure if this is related, but from what I can tell it seems to be...

I am working on parsing a PolyGlot save file, and have the output XML here:

<EtymologyCollection>
  <EtymologyInternalRelation>
    16
    <EtymologyInternalChild>
      13
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    17
    <EtymologyInternalChild>
      13
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    18
    <EtymologyInternalChild>
      13
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    3
    <EtymologyInternalChild>
      6
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    19
    <EtymologyInternalChild>
      13
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    20
    <EtymologyInternalChild>
      13
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    6
    <EtymologyInternalChild>
      5
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    7
    <EtymologyInternalChild>
      10
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    8
    <EtymologyInternalChild>
      10
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    9
    <EtymologyInternalChild>
      10
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    14
    <EtymologyInternalChild>
      13
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    15
    <EtymologyInternalChild>
      13
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
</EtymologyCollection>

This converts into the following JSON, where you can see the EtymologyInternalChild nodes are removed, however the EtymologyInternalRelation nodes are preserved:

"EtymologyCollection": {
    "EtymologyInternalRelation": [
        16,
        17,
        18,
        3,
        19,
        20,
        6,
        7,
        8,
        9,
        14,
        15
    ]
}

I hope this helps, I don't know if it will though!

apolo49 avatar Apr 14 '24 20:04 apolo49

@apolo49 in your example, what would have been your desired json? It seems like something we'll have to invent a convention on how to parse. For example how @rimutaka created the txt field in their initial examples

AlecTroemel avatar Apr 22 '24 13:04 AlecTroemel