Wrong base-uri(/*) returned with Saxon 9.7 and 9.8 under certain conditions
While many of our and our customers’ pipelines could be migrated from Calabash 1.1.15 with Saxon 9.6 to Calabash 1.1.21 with Saxon 9.8, I noticed a regression in a specific project. After hours of debugging, I managed to reproduce it with a minimal example.
The source in this example, Untitled2.xml, is
<?xml version="1.0" encoding="UTF-8"?>
<doc xml:base="file:/foo/bar.xml">
<foo/>
</doc>
The pipeline, Untitled4.xpl, is
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
xmlns:cx="http://xmlcalabash.com/ns/extensions"
xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0" name="mystep">
<p:input port="source" primary="true"/>
<p:output port="result" primary="true"/>
<p:import href="http://xmlcalabash.com/extension/steps/library-1.0.xpl"/>
<cx:message>
<p:with-option name="message"
select="'before: base-uri(): ', base-uri(),
', /*/@xml:base: ', /*/@xml:base,
', base-uri(/*): ', base-uri(/*)"/>
</cx:message>
<p:xslt name="xslt">
<p:input port="parameters">
<p:empty/>
</p:input>
<p:input port="stylesheet">
<p:document href="Untitled3.xsl"/>
</p:input>
</p:xslt>
<cx:message>
<p:with-option name="message"
select="' after: base-uri(): ', base-uri(),
', /*/@xml:base: ', /*/@xml:base,
', base-uri(/*): ', base-uri(/*)"/>
</cx:message>
</p:declare-step>
The XSLT, Untitled3.xsl, is:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
exclude-result-prefixes="xs math"
version="3.0">
<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="@*, node()" mode="#current"/>
</xsl:copy>
</xsl:template>
<xsl:template match="foo">
<xsl:result-document href="f">
<xsl:copy-of select="."/>
</xsl:result-document>
</xsl:template>
<xsl:template match="@xml:base"/>
</xsl:stylesheet>
What happens during the transformation is that /*/@xml:base is removed, and /doc/foo is sent to the secondary port by an xsl:result-document instruction.
Invoking it with Calabash 1.1.22 with Saxon 9.8 or Calabash 1.1.19 with Saxon 9.7 like this:
java -jar xmlcalabash-1.1.22-98.jar -i source=Untitled2.xml Untitled4.xpl
gives the same incorrect results:
Message: before:
base-uri(): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml,
/*/@xml:base: file:/foo/bar.xml,
base-uri(/*): file:/foo/bar.xml
Message: after:
base-uri(): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml,
/*/@xml:base: ,
base-uri(/*): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled3.xsl
<doc>
</doc>
It is incorrect because the result does not have an /*/@xml:base attribute any more and therefore base-uri(/*) should be the same as base-uri(). But base-uri(/*) is now the URI of the XSLT file. (It is not necessarily the URI of the XSLT file that contains the xsl:result-document instruction. In this example, it is, because there is only a single XSLT file.)
The correct output, produced with the Saxon-9.6 versions of XML Calabash 1.1.15 or 1.1.19, is:
Message: before:
base-uri(): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml,
/*/@xml:base: file:/foo/bar.xml,
base-uri(/*): file:/foo/bar.xml
Message: after:
base-uri(): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml,
/*/@xml:base: ,
base-uri(/*): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml
<doc>
</doc>
It doesn’t matter that the attached XSLT is version 3.0, the same error occurs with 2.0.
There’s a lot of complex behavior going on here (thank you 1.0e6 for the small, focused test case), the relevant bit of code is in XSLT.java:
// Before Saxon 9.8, it was possible to simply set the base uri of the
// output document. That became impossible in Saxon 9.8, but I still
// think there might be XProc pipelines that rely on the fact that the
// base URI doesn't change when processed by XSLT. So we're doing it
// the hard way.
TreeWriter fixbase = new TreeWriter(runtime);
fixbase.startDocument(document.getBaseURI());
fixbase.addSubtree(xformed);
fixbase.endDocument();
xformed = fixbase.getResult();
For some reason, that doesn’t work for your stylesheet. Deep in the guts of the TinyTree implementation, there’s a systemIdMap with two entries in it, Untitled2.xml and Untitled3.xsl, and the second one is used.
In the course of misunderstanding the issue at first, I discovered that you can “fix” this bug by adding an explicit template for the document node to your stylesheet:
<xsl:template match="/">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
With that explicit copy, the systemIdMap has only a single value, Untitled2.xml.
Is that enough of a workaround for you?
(I’ll pass this along to Saxonica, but I have no idea if it’s a bug or not.)
Reported to Saxonica: https://markmail.org/thread/tsrtgohiby72v3ye
@ndw Maybe this issue is connected to this one: https://github.com/ndw/xmlcalabash1/issues/255
I tried it with Calabash 1.1.21 and Saxon PE 9.8.0.15. Still the same erroneous output.
Should be fixed for Saxon 9.9: https://saxonica.plan.io/issues/3956#note-10
I put together a 1.1.25 for Saxon 9.9(.1-2). Can you download it from here and let me know if it appears correct to you? (I haven't pushed it to Maven Central yet.)
Seems correct, at least for the example above.