xml2json
xml2json copied to clipboard
Text is lost when transfroming to json
I'm trying to use your package to turn an xml file to json.
The xml comes from projectaon. These are books that are playable on line and various android apps exist to allow you to play them already using the xml. I'm trying to make a similar app in flutter.
Here is an example book "view-source:https://www.projectaon.org/data/trunk/en/xml/01fftd.xml"
When I parse the xml some of the text is missing. The XML seems fine when I manually load one of their xml files into flutter using the xml package I can see all the text too so I don't think there is anything is wrong with the xml.
But when I run it through xml2json some of the text is missing in the json object, no matter which "to" method offered by xml2json I use.
The books are long, so I made a small xml file in case it helps with just the meta section: https://gist.github.com/sketchbuch/4f000c0690bc2646dbd54aa52de706ad
Running this through xml2json you will see that some text is not in the json:
"Distribution of this Internet Edition is restricted under the terms..." "you swear revenge. But first you must reach Holmgard to warn..."
It seems like anything after the <ch.copy/> tags is also lost like the copyright information
OK, which transformer are you using?
I tried all of them, which ever one I use there is always text missing. The one I would like to use is parker with attrs
Yep, taking the description -
<description class="blurb">
<p>You are Lone Wolf. In a devastating attack the Darklords have destroyed the monastery where you were learning the skills of the Kai Lords. You are the sole survivor.</p>
<p>In <strong><cite>Flight from the Dark</cite></strong>, you swear revenge. But first you must reach Holmgard to warn the King of the gathering evil. Relentlessly the servants of darkness hunt you across your country and every turn of the page presents a new challenge. Choose your skills and your weapons carefully<ch.emdash/>for they can help you succeed in the most fantastic and terrifying journey of your life.</p>
</description>
Its generating -
"description": [
{
"_class": "blurb",
"p": [
"You are Lone Wolf. In a devastating attack the Darklords have destroyed the monastery where you were learning the skills of the Kai Lords. You are the sole survivor.",
"In "
],
"strong": {
"cite": "Flight from the Dark"
},
"ch.emdash": ""
},
Its getting confused with the HTML markup, it thinks they are attributes, not sure what I can do about this off the top of my head, maybe strip this markup out of known text nodes but I'm not sure this will work.
If you wrap the text in a CDATA section its OK of course -
<description class="blurb">
<![CDATA[<p>You are Lone Wolf. In a devastating attack the Darklords have destroyed the monastery where you were learning the skills of the Kai Lords. You are the sole survivor.</p>
<p>In <strong><cite>Flight from the Dark</cite></strong>, you swear revenge. But first you must reach Holmgard to warn the King of the gathering evil. Relentlessly the servants of darkness hunt you across your country and every turn of the page presents a new challenge. Choose your skills and your weapons carefully<ch.emdash/>for they can help you succeed in the most fantastic and terrifying journey of your life.</p>]]>
</description>
transforms to -
"description": [
{
"_class": "blurb",
"value": "<p>You are Lone Wolf. In a devastating attack the Darklords have destroyed the monastery where you were learning the skills of the Kai Lords. You are the sole survivor.</p><p>In <strong><cite>Flight from the Dark</cite></strong>, you swear revenge. But first you must reach Holmgard to warn the King of the gathering evil. Relentlessly the servants of darkness hunt you across your country and every turn of the page presents a new challenge. Choose your skills and your weapons carefully<ch.emdash/>for they can help you succeed in the most fantastic and terrifying journey of your life.</p>"
I think your going to have to pre process this markup and add CDATA sections to get this to work. },
Thinking about this, the package was originally intended to convert XML based API generated data into JSON, which is much more consumable these days, in which case of course any embedded markup would be encoded as CDATA. I don't want to change any of the existing transforms to support the markup you supplied above, however, I don't see why another transformer that supports ebook formats can't be added to do this.
I'll have a quick look at this.