node-xml2js
node-xml2js copied to clipboard
Preserving order of HTML text + tags
Hi, I'm trying to parse XML files from an external service that contain a block of HTML in them. I am using the parser to collect some other information and then target the html block and use the builder to rebuild it. But I am having an issue.
Say you have this originally:
<p>
<span class="location>TORONTO, </span>
It is a nice day...
</p>
The parser turns this into
{
_: 'It is a nice day...',
span: [{
$: { class: 'location' },
_: 'TORONTO,'
}]
}
Causing the builder to return (backwards)
<p>
It is a nice day...
<span class="location>TORONTO, </span>
</p>
Am I missing a useful option to preserve the order? Or is there a way to stop the parser once it gets to a certain tag so that the HTML is never parsed?
Maybe setting these options can help: charsAsChildren: true explicitChildren: true preserveChildrenOrder: true
In this case you would get an array with 2 elements : the SPAN as [0] and the text as [1].
When I do this, the resulting object has all of the text content duplicated, and all of the children duplicated, I think. This results in an object that consumes more memory and processing time than would otherwise be required. Any fixes for this?
Agree on the fact that this is not an acceptable solution. I spend so much time to figure out how to preserve the order. I can't believe this is so complicated and that the output is so twisted. I cannot afford to have all the data duplicated in the json, plus the output structure is really mind bending.
explicitChildren: true, preserveChildrenOrder: true, charsAsChildren: true
Even if you use above three options, you will get a lot of duplicated data. I switched to https://github.com/nashwaan/xml-js that worked for my requirements. It preserves the order.
Also if your input xml is deeply nested, you might find the https://marketplace.visualstudio.com/items?itemName=nidu.copy-json-path plugin useful (I guess similar plugins are available for other IDE's as well). You can easily navigate the output JSON using this plugin. This saved a lot of time for me.