xmlcalabash1 icon indicating copy to clipboard operation
xmlcalabash1 copied to clipboard

Expected behaviour of zipping non-XML text (other than using base64).

Open LeifW opened this issue 5 years ago • 1 comments

When data comes in for pxp:zip as:

<c:data xml:base="mimetype" content-type="application/octet-stream" encoding="base64">YXBwbGljYXRpb24vZXB1Yit6aXA=</c:data>

it gets gets saved in the .zip file with the contents: application/epub+zip However, if it comes in as:

<c:data xml:base="mimetype" content-type="text/plain">application/epub+zip</c:data>

it gets saved into the zip file as <?xml version="1.0" encoding="UTF-8"?><c:data xmlns:c="http://www.w3.org/ns/xproc-step" content-type="text/plain">application/epub+zip</c:data>

I was hoping for just the plain text contents in the file, no XML.

Ah, looking at the source code introduced in #133 , I got it to work - just have to change the element from <c:data/> to <c:result/>: https://github.com/ndw/xmlcalabash1/blob/saxon99/src/main/java/com/xmlcalabash/extensions/Zip.java#L595 If I'm reading that if-statement right, if it's in the xproc-step namespace, or not-namespaced, and has a encoding="base64" attribute, the text contents get base64-decoded and saved. If it's a <c:data/> element and has a content-type that starts with "text/", the text content gets saved.

That seems a little non-obvious / undocumented. The XProc spec doesn't seem to document the attributes on <c:result/>, while <c:data/> is documented, and is what is returned by an evaluation of p:data. In the spec (e.g. for validated with relax ng or xquery steps), it mentions treating the contents of c:data as text.

Not clear on the differences between those two elements - some steps return <c:data/>, and some return <c:result/>? Anyways, as a consequence of this - running something sourced with <p:data content-type="text/plain" href="mimetype"/> through pxp:zip will save the file you read as plain text with an XML wrapper, while <p:data href="mimetype"/> will get saved as plain text (no wrapper), because p:data defaults to emitting the element as content-type="application/octet-stream".

LeifW avatar Feb 22 '20 09:02 LeifW

I'm not seeing <c:result content-type="text/plain"/> elements being generated by any of the steps I'm looking at when reading through the spec - they come back with no attributes. Curious when that would come up (besides writing one literally inside <p:inline>).

LeifW avatar Feb 22 '20 09:02 LeifW