Elements are ignored if a text node is present at the same level
This issue describes an uncommon scenario where text and elements are mixed together at the same level. I have encountered it in the wild, but not in the context of XML to JSON conversion.
Example
Consider the following well-formed XML example:
<Root>
Some text is totally valid here
<TaxRate>7.25</TaxRate>
<Data>
and also at this level
<Category>A</Category>
<Quantity>3</Quantity>
<Price>24.50</Price>
</Data>
</Root>
It has 2 instances of text and element nodes at the same level. The expected JSON would be:
{
"Root": {
"Data": {
"Category": "A",
"Price": 24.5,
"Quantity": 3,
"txt": "and also at this level"
},
"TaxRate": 7.25,
"txt": "Some text is totally valid here"
}
}
but because of the logic in the code where we check for the presence of the text node ( if el.text().trim() != "" { ...) and only handle child elements in the else to that the JSON looses the elements:
{
"Root": "Some text is totally valid here"
}
Solution
The solution would be to refactor fn convert_node in lib.rs to process the children recursively regardless of the presence of the text node.
This is a low priority issue. No action is expected unless we actually have someone affected by it.
I am unsure if this is related, but from what I can tell it seems to be...
I am working on parsing a PolyGlot save file, and have the output XML here:
<EtymologyCollection>
<EtymologyInternalRelation>
16
<EtymologyInternalChild>
13
</EtymologyInternalChild>
</EtymologyInternalRelation>
<EtymologyInternalRelation>
17
<EtymologyInternalChild>
13
</EtymologyInternalChild>
</EtymologyInternalRelation>
<EtymologyInternalRelation>
18
<EtymologyInternalChild>
13
</EtymologyInternalChild>
</EtymologyInternalRelation>
<EtymologyInternalRelation>
3
<EtymologyInternalChild>
6
</EtymologyInternalChild>
</EtymologyInternalRelation>
<EtymologyInternalRelation>
19
<EtymologyInternalChild>
13
</EtymologyInternalChild>
</EtymologyInternalRelation>
<EtymologyInternalRelation>
20
<EtymologyInternalChild>
13
</EtymologyInternalChild>
</EtymologyInternalRelation>
<EtymologyInternalRelation>
6
<EtymologyInternalChild>
5
</EtymologyInternalChild>
</EtymologyInternalRelation>
<EtymologyInternalRelation>
7
<EtymologyInternalChild>
10
</EtymologyInternalChild>
</EtymologyInternalRelation>
<EtymologyInternalRelation>
8
<EtymologyInternalChild>
10
</EtymologyInternalChild>
</EtymologyInternalRelation>
<EtymologyInternalRelation>
9
<EtymologyInternalChild>
10
</EtymologyInternalChild>
</EtymologyInternalRelation>
<EtymologyInternalRelation>
14
<EtymologyInternalChild>
13
</EtymologyInternalChild>
</EtymologyInternalRelation>
<EtymologyInternalRelation>
15
<EtymologyInternalChild>
13
</EtymologyInternalChild>
</EtymologyInternalRelation>
</EtymologyCollection>
This converts into the following JSON, where you can see the EtymologyInternalChild nodes are removed, however the EtymologyInternalRelation nodes are preserved:
"EtymologyCollection": {
"EtymologyInternalRelation": [
16,
17,
18,
3,
19,
20,
6,
7,
8,
9,
14,
15
]
}
I hope this helps, I don't know if it will though!
@apolo49 in your example, what would have been your desired json? It seems like something we'll have to invent a convention on how to parse. For example how @rimutaka created the txt field in their initial examples