xmlcalabash1
xmlcalabash1 copied to clipboard
Expected behaviour of zipping non-XML text (other than using base64).
When data comes in for pxp:zip
as:
<c:data xml:base="mimetype" content-type="application/octet-stream" encoding="base64">YXBwbGljYXRpb24vZXB1Yit6aXA=</c:data>
it gets gets saved in the .zip file with the contents: application/epub+zip
However, if it comes in as:
<c:data xml:base="mimetype" content-type="text/plain">application/epub+zip</c:data>
it gets saved into the zip file as <?xml version="1.0" encoding="UTF-8"?><c:data xmlns:c="http://www.w3.org/ns/xproc-step" content-type="text/plain">application/epub+zip</c:data>
I was hoping for just the plain text contents in the file, no XML.
Ah, looking at the source code introduced in #133 , I got it to work - just have to change the element from <c:data/>
to <c:result/>
:
https://github.com/ndw/xmlcalabash1/blob/saxon99/src/main/java/com/xmlcalabash/extensions/Zip.java#L595
If I'm reading that if-statement right, if it's in the xproc-step namespace, or not-namespaced, and has a encoding="base64"
attribute, the text contents get base64-decoded and saved. If it's a <c:data/>
element and has a content-type that starts with "text/", the text content gets saved.
That seems a little non-obvious / undocumented. The XProc spec doesn't seem to document the attributes on <c:result/>
, while <c:data/>
is documented, and is what is returned by an evaluation of p:data
. In the spec (e.g. for validated with relax ng or xquery steps), it mentions treating the contents of c:data
as text.
Not clear on the differences between those two elements - some steps return <c:data/>
, and some return <c:result/>
?
Anyways, as a consequence of this - running something sourced with <p:data content-type="text/plain" href="mimetype"/>
through pxp:zip
will save the file you read as plain text with an XML wrapper, while <p:data href="mimetype"/>
will get saved as plain text (no wrapper), because p:data
defaults to emitting the element as content-type="application/octet-stream".
I'm not seeing <c:result content-type="text/plain"/>
elements being generated by any of the steps I'm looking at when reading through the spec - they come back with no attributes. Curious when that would come up (besides writing one literally inside <p:inline>)
.