XMLCoder icon indicating copy to clipboard operation
XMLCoder copied to clipboard

Return Unrecognized Elements as a String?

Open eyallen opened this issue 5 years ago • 6 comments

I am trying to implement a parser for a XML format that can contain extensions. These extensions are valid XML, but we don't know what format they will be in.

I'd like to return any unrecognized elements as a string for later parsing by consumers. Is this possible?

An example:

<Data version="2.0">
  <Sample>Testing</Sample>
  <Extensions>
   // This could be anything
  </Extensions>
</Data>

I'd like the following structure:

struct Data: Codable {
  let version: String
  let extensions: String?
}

It seems that since Extensions is valid XML, it always gets parsed and thus I am unable to convert this back to a string.

I was hoping to be able to do this via an init(from: decoder: Decoder) but at that point it appears it is already to late.

Any ideas?

eyallen avatar May 22 '19 17:05 eyallen

Hi @eyallen, thank you asking this question, I really should clarify this in README. I think a similar thing was asked in #90, do you find my answer to that issue applicable in your case?

MaxDesiatov avatar May 22 '19 17:05 MaxDesiatov

@MaxDesiatov I could be reading it wrong, but that feels like a separate use case to me.

In my case, I never want the XML to end up in a struct as I have no idea what the format would be.

To extend my example above, I could receive:

<Data version="2.0">
  <Sample>Testing</Sample>
  <Extensions>
    <Extension>some data</Extension>
     <Extension><SomethingDifferent></SomethingDifferent></Extension>
  </Extensions>
</Data>

Or Even:

<Data version="2.0">
  <Sample>Testing</Sample>
  <Extensions>
    <Extension>SomethingCompletelyDifferent</Extension>
  </Extensions>
</Data>

The only thing I know when parsing the XML is:

  1. There may be an Extensions node
  2. Inside the Extensions node, for each extension it will have a root Extension element
  3. Inside that Extension element, there may be valid XML but not guaranteed. It could be JSON.

I'd like to surface the Extension node as a struct whose contents just contain the string of whatever is inside it, but this seems to break down when inside Extension is valid XML.

I think what I really need is to get access to the XMLCoderElement for each Extension and either return the value or the XMLString so that my struct could be:

struct Extension: Codable {
  let contents: String?
}

Not sure how to do that/if that is even possible.

eyallen avatar May 22 '19 17:05 eyallen

If you'd like to represent the inner unescaped XML as a raw string, that's unfortunately not supported in the underlying XMLParser from Swift's Foundation module. If escaping that data is an option, that's very similar to CDATA, especially the way it's used in formats such as RSS. CDATA is supported in both XMLParser and XMLCoder and would be decoded as a String. Would that work well in your case?

MaxDesiatov avatar May 22 '19 17:05 MaxDesiatov

Thats good to know, thanks.

I was hoping in the XML case at least that I could detect a Extension element and then serialize to string from there. I'd expect that would require me to have access to the XMLParserDelegate, though I see now it is merely reporting star/end elements without giving me the underlying parsed element or an index in the Data.

eyallen avatar May 22 '19 18:05 eyallen

Yeah, with the current XMLParserDelegate API that would be problematic without CDATA, I'd highly recommend using CDATA if possible. Otherwise that would also impact the performance too, XMLParser would detect inner content as proper XML, parse it, and then XMLCoder would need to serialize it back for your consumption, which means needless coding/decoding cycle.

MaxDesiatov avatar May 22 '19 20:05 MaxDesiatov

Thanks. Sounds like I will just need to pre-process the XML on the client side and wrap the internals of the Extension tags in CDATA to allow me to extra it as a String for later processing.

I'll give that a shot.

eyallen avatar May 22 '19 20:05 eyallen