staticSearch icon indicating copy to clipboard operation
staticSearch copied to clipboard

We should be using XPath map/array handling rather than serializing to/from XML

Open joeytakeda opened this issue 2 years ago • 5 comments

In json.xsl, we create stash the created map in a variable that are then serialized using xml-to-json(). E.g.:

https://github.com/projectEndings/staticSearch/blob/c1dbd48281dfec376a3da6294a1469e384d489da/xsl/json.xsl#L204-L209

That map is not a map{} in the XPath sense, but an XML structure in the JSON namespace:

https://github.com/projectEndings/staticSearch/blob/c1dbd48281dfec376a3da6294a1469e384d489da/xsl/json.xsl#L322-L354

However, there's no need to do that since we could use <xsl:map> and set the output method to json:

 <xsl:result-document href="{$outDir}/stems/{$stem}{$versionString}.json" method="json">
       <xsl:call-template name="makeMap"/>
</xsl:result-document>
<xsl:map>
     <xsl:map-entry key="'docId'" select="string($thisDoc/@id)"/>
     <xsl:map-entry key="'docUri'" select="string($thisDoc/@data-staticSearch-relativeUri)"/>
<!--[...]-->
</xsl:map>

I'm not sure if there was a reason why we didn't do this in the first place, but I can't think of any reason not to do so at this point. It may improve performance slightly as well as possibly lessen memory use. (There's some indication that constructing sequences in the result-document allows Saxon more leverage when it comes to releasing memory; I don't imagine it would be significant, but worth testing anyway.) In any case, it's a bit cleaner and feels a bit better to use built-in functions.

The only annoying bit is that there is no such thing (yet) as an <xsl:array> instruction[1], so we would still need to stash all of the context arrays in a variable in order to create an array using the array{} constructor, but that's simple enough.

-- [1]: They are proposing such a thing for XSLT 4.0: https://www.saxonica.com/html/documentation11/xsl-elements/array.html

joeytakeda avatar Mar 17 '22 17:03 joeytakeda

I'll be intrigued to see if this makes any difference. I must admit that I've never really understood what happens when you use method="json" and what the consequences of @json-node-output-method actually are. I would guess that by this time, xml-to-json() is very solid and well-optimized, and the construction of an in-memory XML tree precursor is also pretty optimal so there would be little to gain, but it would certainly be nice to reduce the memory requirement here if it works.

martindholmes avatar Mar 17 '22 18:03 martindholmes

I'm going to experiment with this in branch iss239-json

joeytakeda avatar Mar 17 '22 18:03 joeytakeda

I have now done this in the iss239-json branch, but my tests are so far inconclusive.

In my local copy, I've added the repeat option and the -t option to the JSON step:

    <java classpath="${ssSaxon}" classname="net.sf.saxon.Transform" failonerror="true">
      <arg value="-xsl:${ssBaseDir}/xsl/json.xsl"/>
      <arg value="-s:${ssBaseDir}/xsl/json.xsl"/>
      <arg value="-repeat:6"/>
      <arg value="-t"/>
      <arg value="--suppressXsltNamespaceCheck:on"/>
    </java>

I've attached the report for the test set to this ticket (with the respective branch names in the file). I'm going to test now with just the English TEI guidelines to see if those results are more conclusive.

json_dev.txt json_iss239-json.txt

joeytakeda avatar Mar 18 '22 19:03 joeytakeda

It's not making much difference. Does it make for more human-readable, maintainable code in your view?

martindholmes avatar Mar 18 '22 20:03 martindholmes

I think it is, but at the same time, <j:map> has the advantage of having the j:array[@key], which is much nicer than stuffing a bunch of stuff in a variable to put in the array{} constructor. So I think it may be worthwhile to hold off on this (switching the code was fairly trivial, so it's fine to throw it away) until <xsl:array> is part of the spec.

joeytakeda avatar Mar 19 '22 00:03 joeytakeda