xmlutil
xmlutil copied to clipboard
How to simply parse/serialize a Map to/from XML that has keys as tag names wrapping text values.
I'm trying to simply parse a format like this:
<metadata>
<fieldA>this is the text value for fieldA</fieldA>
<fieldB>this is the text value for fieldB</fieldB>
<fieldC>
<fieldD>
<fieldE>this is the text value for fieldE</fieldE>
</fieldD>
</fieldC>
</metadata>
into a Map<String, String> with the entry keys being the tag names:
mapOf(
"fieldA" to "this is the text for fieldA",
"fieldB" to "this is the text for fieldB",
"fieldE" to "this is the text for fieldE"
)
I've made my own policy that overrides handleUnknownContentRecovering and just puts the keys and values into a single Map like this:
dataMap[input.name.toCName()] = input.elementContentToFragment().contentString
and returns the Map for elementIndex = 0. But I'm not sure what to do about the nested tags.
I also need to serialize such a Map.
I've tried using the existing MapEncoder, but I don't need keys and values to have their own tags. Maybe it can work if the entry name could use the key name and omit the key element, with the value collapsed. But I couldn't figure out how to get it to do that.
Any help would be greatly appreicated.
The challenge is that this is not quite valid Xml. Tag names are intended to be well-defined. I would consider custom parsing the best solution (if you do this in a custom serializer you can still parse the values using serialization). However, there are other options:
- Probably best: Using a custom serializer on the container parse the xml manually and then use serialization for the values.
- You could do do some rewriting as in https://github.com/pdvrieze/xmlutil/blob/master/examples/DYNAMIC_TAG_NAMES.md
- You could also use the fact that if you have a list of
Nodeinstances it will work (note that text element serialization is broken - also marking it as XmlValue breaks). - Your handling of the content would work (have a look at the depth property to "deal" with nesting)
Thank you! I've actually already started implementing the dynamic tag names approach serializing a Map instead of a List. Serializing the Map without nesting worked straightaway, but I have not successfully parsed out values with MapEntrySerializer and DynamicTagReader.
But, I'm coming to the realization that this approach is probably more work that it is worth, being that it is a minor part of the overall data model (the XML is the metadata of one object type in a vast sea of JSON). Being that our structure is not quite valid XML, I'm thinking maybe I should just revert back to doing dumb string building and parsing and use expect/actuals for the parts that don't have pure kotlin solutions (StringEscapeUtils.escapeXml11 in Apache Commons Text, for example).
@gladapps You don't need to go to raw parsing with regexes or something. You can use the (separate) xml parsing support from the core library. You just create your parser, then read events, if it is a tag handle it (read the value (perhaps recursively), then add it to your list). It is serialization that doesn't like it (it makes too many assumptions), not the xml parser. However, parsing a list of Nodes "should" work (but it doesn't due to a bug).
@pdvrieze Is there an example of creating own parser with read events and tag handles? Would love to have a look at it.
@susrisha The way to go is to use the object XmlStreaming (I will be transitioning to an accessor function xmlStreaming due to changes in multiplatform expect/actual). This object allows you to create instances of parsers/serializers in a platform independent way (the "generic" variants create platform independent variants so you would have consistent behaviour, those can also be created directly as KtXmlReader and KtXmlWriter). Then you can use next to get the next event and nextTag to get the next tag event (note that the latter will verify there is no ignorable content in between). You can retrieve the current event as eventType and depending on the event type you can retrieve name, attributes, text content etc. Have a look at the documentation: https://pdvrieze.github.io/xmlutil/core/nl.adaptivity.xmlutil/-xml-reader/index.html .
Note that you don't need the serialization part of the library for this, only core.