skeleton performance: combine all single-rule-patterns
Currently skeleton will generate an own xslt mode for every pattern. In my scenario I have several (100+) patterns containing only a single rule each. This means that the whole document tree is processed 100+ times. I can not put these rules all into the same pattern because I might have multiple rules that match on the same node and I still need all rules to be checked on this node. (also see #18) Another reason for not putting them into the same pattern is that these rules are spread accross several xsd files because I like to put the schematron rule for a specific element right next to the definition of it.
Now my idea would be to collect all patterns with a single rule and process them with a single apply-templates. So the follow-up of every matching rule would be an xsl:next-match to ensure that every rule is checked on every node.
A side effect would be that the error messages of all these patterns are orderd by their position in the document - not by the pattern. But at least for me this would be a positive effect.
Another possible way to handle that is to turn each of the rules into an abstract rule, then have one pattern with one concrete rule for each possible context. Each of those rules selects one or more abstract rules (using sch:extends) as necessary.
That gives a single XSLT mode, and allows the abstract rules to be distributed into XSD files too (wth declarations brought in using sch:extends[@href])
I think I managed to implement what you suggested.
So as a sample I have a standard element (topic/p) and a derived one (custom/paragraph). So the rule for the custom-paragraph should match on both: topic-p and custom-paragraph.
<sch:pattern>
<sch:rule context="*[contains(@class, ' custom/paragraph ')]">
<sch:extends rule="topic-p"/>
<sch:extends rule="custom-paragraph"/>
</sch:rule>
<sch:rule context="*[contains(@class, ' topic/p ')]">
<sch:extends rule="topic-p"/>
</sch:rule>
<!-- abstract rules -->
<sch:rule id="topic-p" abstract="true">
<!-- for class topic/p -->
<sch:report test="true()">
<sch:name/>: topic/p
</sch:report>
</sch:rule>
<sch:rule id="custom-paragraph" abstract="true">
<!-- for class custom/paragraph, derived from topic/p -->
<sch:report test="true()">
<sch:name/>: custom/paragraph
</sch:report>
</sch:rule>
</sch:pattern>
I think this will work fine for several of my use-cases and I will do some experiments with it.
However, the advantage of my original suggestion was that it requires no modification of the schematron file - it will just improve the performance in certain cases...
Yes, that was the idea.
Regards Rick
On 14 Feb 2017 03:50, "Patrik Stellmann" [email protected] wrote:
I think I managed to implement what you suggested.
So as a sample I have a standard element (topic/p) and a derived one (custom/paragraph). So the rule for the custom-paragraph should match on both: topic-p and custom-paragraph.
sch:pattern
<sch:rule context="[contains(@class, ' custom/paragraph ')]"> <sch:extends rule="topic-p"/> <sch:extends rule="custom-paragraph"/> </sch:rule> <sch:rule context="[contains(@class, ' topic/p ')]"> <sch:extends rule="topic-p"/> </sch:rule>
<sch:rule id="topic-p" abstract="true"> <sch:report test="true()"> sch:name/: topic/p </sch:report> </sch:rule> <sch:rule id="custom-paragraph" abstract="true"> <sch:report test="true()"> sch:name/: custom/paragraph </sch:report> </sch:rule>
</sch:pattern>
I think this will work fine for several of my use-cases and I will do some experiments with it.
However, the advantage of my original suggestion was that it requires no modification of the schematron file - it will just improve the performance in certain cases...
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron/issues/38#issuecomment-279449904, or mute the thread https://github.com/notifications/unsubscribe-auth/AX3VKYMN2fmxtux4cMju2ZCnnSIanDh5ks5rcIm4gaJpZM4L-Smn .
On 13/02/2017 16:49, Patrik Stellmann wrote:
So as a sample I have a standard element (topic/p) and a derived one (custom/paragraph). So the rule for the custom-paragraph should match on both: topic-p and custom-paragraph.
Presumably you could instead have 'custom-paragraph' extend 'topic-p':
sch:pattern
<sch:rule context="[contains(@class, ' custom/paragraph ')]"> <sch:extends rule="custom-paragraph"/> </sch:rule> <sch:rule context="[contains(@class, ' topic/p ')]"> <sch:extends rule="topic-p"/> </sch:rule>
<sch:rule id="topic-p" abstract="true"> <sch:report test="true()"> sch:name/: topic/p </sch:report> </sch:rule> <sch:rule id="custom-paragraph" abstract="true"> <sch:extends rule="topic-p"/> <sch:report test="true()"> sch:name/: custom/paragraph </sch:report> </sch:rule>
</sch:pattern>
It could make your Schematron shorter/neater, but I don't know what it would do for performance.
I think this will work fine for several of my use-cases and I will do some experiments with it.
However, the advantage of my original suggestion was that it requires no modification of the schematron file - it will just improve the performance in certain cases...
If you are looking for a performance improvement, and if you are using a non-free Saxon 9, then you could replace "contains(@class, ' custom/paragraph ')" with your own function that uses 'saxon:memo-function="yes"' so that, after the first time that Saxon has seen any combination of values, it returns the remembered result instead of doing the string comparison every time. Something like:
<xsl:function name="ps:contains" as="xs:boolean" saxon:memo-function="yes" xmlns:saxon="http://saxon.sf.net/" > <xsl:param name="class" as="xs:string" /> <xsl:param name="specialisation" as="xs:string" />
<xsl:sequence select="contains($class, $specialisation)" /> </xsl:function>
Here, too, YMMV so you'd need to test it on your own real data to see if it makes any improvement for you.
(Tony, any chance you could write up a couple of paragraphs on saxon:memo and I can stick it on Schematron.com.)
Regards Rick
On Wed, Feb 15, 2017 at 8:01 PM, Tony Graham [email protected] wrote:
On 13/02/2017 16:49, Patrik Stellmann wrote:
So as a sample I have a standard element (topic/p) and a derived one (custom/paragraph). So the rule for the custom-paragraph should match on both: topic-p and custom-paragraph.
Presumably you could instead have 'custom-paragraph' extend 'topic-p':
sch:pattern
<sch:rule context="[contains(@class, ' custom/paragraph ')]"> <sch:extends rule="custom-paragraph"/> </sch:rule> <sch:rule context="[contains(@class, ' topic/p ')]"> <sch:extends rule="topic-p"/> </sch:rule>
<sch:rule id="topic-p" abstract="true">
<sch:report test="true()"> sch:name/: topic/p </sch:report> </sch:rule> <sch:rule id="custom-paragraph" abstract="true">
<sch:extends rule="topic-p"/> <sch:report test="true()"> sch:name/: custom/paragraph </sch:report> </sch:rule>
</sch:pattern>
It could make your Schematron shorter/neater, but I don't know what it would do for performance.
I think this will work fine for several of my use-cases and I will do some experiments with it.
However, the advantage of my original suggestion was that it requires no modification of the schematron file - it will just improve the performance in certain cases...
If you are looking for a performance improvement, and if you are using a non-free Saxon 9, then you could replace "contains(@class, ' custom/paragraph ')" with your own function that uses 'saxon:memo-function="yes"' so that, after the first time that Saxon has seen any combination of values, it returns the remembered result instead of doing the string comparison every time. Something like:
<xsl:function name="ps:contains" as="xs:boolean" saxon:memo-function="yes" xmlns:saxon="http://saxon.sf.net/" > <xsl:param name="class" as="xs:string" /> <xsl:param name="specialisation" as="xs:string" />
<xsl:sequence select="contains($class, $specialisation)" /> </xsl:function>
Here, too, YMMV so you'd need to test it on your own real data to see if it makes any improvement for you.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron/issues/38#issuecomment-279954072, or mute the thread https://github.com/notifications/unsubscribe-auth/AX3VKX3tZytx4gWoyMjQMtzR84m7F8E5ks5rcr7ygaJpZM4L-Smn .
In XSLT 3.0 there is special functions for this: https://www.w3.org/TR/xpath-functions-31/#func-contains-token
Re the original idea of xsl:next-match: This would require two additional constraints on the Schematron. It must not use let-bindings in patterns (empty(sch:pattern/sch:let)) and it must not use subordinate documents (empty(sch:pattern/@documents)).