exist icon indicating copy to clipboard operation
exist copied to clipboard

Support for XQuery Tumbling and Sliding Window

Open adamretter opened this issue 3 years ago • 18 comments

This implements both tumbling window and sliding window.

Along the way to make sure it passes the Window tests in the XQTS. We also had to fix a number of other things:

  1. Several fn:deep-equal bug-fixes
  2. Only evaluate arguments to fn:codepoint-equal once
  3. Complete re-implementation of fn:sort
  4. Almost a complete rewrite of group by expression implementation
  5. Fix comparison of NaN values for xs:float and xs:double
  6. Fix comparison of INF and -INF values for xs:float and xs:double
  7. Fix and implement op:same-key#2 for XQuery Map types
  8. Re-implement fn:deep-equal in terms of op:deepCompare
  9. Several Error Code fixes
  10. Several XDM Type fixes

By running exist-xqts-runner (from its compat/eXist-7.0.0-SNAPSHOT branch):

exist-xqts-runner-assembly-1.4.0-SNAPSHOT.jar --xqts-version HEAD --output-dir /tmp/xqts-output --test-case fn-codepoint-equal,fn-deep-equal,fn-sort,prod-GroupByClause,prod-WindowClause

We see the results:

Screenshot_select-area_20220827145105

The 7 remaining Window failures are all related to the count clause which still needs to be implemented first in a separate PR, see: https://github.com/eXist-db/exist/pull/4530.


This open source contribution to the eXist-db project was commissioned by the Office of the Historian, U.S. Department of State, https://history.state.gov/.

adamretter avatar Aug 29 '22 18:08 adamretter

Very exciting! This works with all of the code from XQuery for Humanists in the sections on Windowing except for one, which apparently isn't tested in XQTS - and which I'll submit as a new test there. This query is:

let $sentence := "This is a a test of of windowing"
let $words := fn:tokenize($sentence, " ")
for tumbling window $window in $words
    start $ws at $wsn previous $wp next $wn 
        when $wp eq $ws
return
    <window ws="{$ws}" wsn="{$wsn}" wp="{$wp}" wn="{$wn}">{ $window }</window>

The results from this branch:

<window ws="of" wsn="7" wp="of" wn="windowing">a test of</window>
<window ws="windowing" wsn="8" wp="of" wn="">of windowing</window>

The expected results, as returned by BaseX and Saxon:

<window ws="a" wsn="4" wp="a" wn="test">a test of</window>
<window ws="of" wsn="7" wp="of" wn="windowing">of windowing</window>

(The window is selected correctly, but the variables bound by the start, previous, next, and at clauses are wrong.)

Also, there seems to be one parser issue. The query:

<x/>/@when

... returns the error:

error found while executing expression: org.exist.xquery.XPathException: err:XPST0003 unexpected token: when [at line 1, column 7]

From exist.log:

2022-08-30 08:53:17,523 [qtp1304614894-67] ERROR (XQueryServlet.java [process]:549) - Cannot compile xquery: err:XPST0003 org.exist.xquery.XPathException: err:XPST0003 unexpected token: when [at line 1, column 7]
 [at line 7, column 1] 
org.exist.EXistException: Cannot compile xquery: err:XPST0003 org.exist.xquery.XPathException: err:XPST0003 unexpected token: when [at line 1, column 7]
 [at line 7, column 1]
...
Caused by: org.exist.xquery.StaticXQueryException: err:XPST0003 org.exist.xquery.XPathException: err:XPST0003 unexpected token: when [at line 1, column 7]
 [at line 7, column 1]
	at org.exist.xquery.XQuery.compile(XQuery.java:235) ~[exist-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
	at org.exist.xquery.XQuery.compile(XQuery.java:180) ~[exist-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
	at org.exist.xquery.XQuery.compile(XQuery.java:138) ~[exist-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
	at org.exist.http.servlets.XQueryServlet.process(XQueryServlet.java:439) ~[exist-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
	... 66 more

joewiz avatar Aug 30 '22 13:08 joewiz

@adamretter I can confirm that with the last commit the <x/>/@when issue is no longer present. Thank you!

joewiz avatar Aug 30 '22 16:08 joewiz

@joewiz This now passes your twt:jw-duplicate-words#0 test.

adamretter avatar Jun 06 '23 21:06 adamretter

@dizzzz @reinhapa I think this one is now ready to land. Let me know your thoughts....

adamretter avatar Jun 08 '23 02:06 adamretter

@adamretter It appears there is a failing test. From ubuntu-latest Test - see also macOS-latest Test:

Error:  Tests run: 833, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 50.683 s <<< FAILURE! - in xquery.xquery3.XQuery3Tests
Error:  xqts.org.exist-db.xquery.test.groupby.atomize-group-vars  Time elapsed: 0.103 s  <<< FAILURE!
org.junit.ComparisonFailure: expected:<...g-time&gt;&lt;person[&gt;Anton&lt;/person&gt;&lt;other&gt;&lt;person&gt;Anton&lt;/person&gt;&lt;/other&gt;&lt;/working-time&gt;&lt;working-time&gt;&lt;person&gt;Barbara&lt;/person&gt;&lt;other&gt;&lt;person&gt;Barbara&lt;/person&gt;&lt;/other&gt;&lt;/working-time&gt;&lt;working-time&gt;&lt;person&gt;Clara&lt;/person&gt;&lt;other&gt;&lt;person&gt;Clara&lt;/person&gt;&lt;/other&gt;&lt;/working-time&gt;&lt;working-time&gt;&lt;person/&gt;&lt;other/&gt;&lt;/working-time&gt;&lt;/result&gt; (assertEquals failed.)]> but was:<...g-time&gt;&lt;person[/&gt;&lt;other/&gt;&lt;/working-time&gt;&lt;working-time&gt;&lt;person&gt;Anton&lt;/person&gt;&lt;other&gt;&lt;person&gt;Anton&lt;/person&gt;&lt;/other&gt;&lt;/working-time&gt;&lt;working-time&gt;&lt;person&gt;Barbara&lt;/person&gt;&lt;other&gt;&lt;person&gt;Barbara&lt;/person&gt;&lt;/other&gt;&lt;/working-time&gt;&lt;working-time&gt;&lt;person&gt;Clara&lt;/person&gt;&lt;other&gt;&lt;person&gt;Clara&lt;/person&gt;&lt;/other&gt;&lt;/working-time&gt;&lt;/result&gt;]>

Examining the failing test, https://github.com/eXist-db/exist/blob/e72d7368dd0ae0fb7656ee3985893b3ac3ae846d/exist-core/src/test/xquery/xquery3/groupby.xql#L633-L720, the test is written to expect the following output:

<result>
    <working-time>
        <person>Anton</person>
        <other>
            <person>Anton</person>
        </other>
    </working-time>
    <working-time>
        <person>Barbara</person>
        <other>
            <person>Barbara</person>
        </other>
    </working-time>
    <working-time>
        <person>Clara</person>
        <other>
            <person>Clara</person>
        </other>
    </working-time>
    <working-time>
        <person/>
        <other/>
    </working-time>
</result>

However, it actually returns the following:

<result>
    <working-time>
        <person/>
        <other/>
    </working-time>
    <working-time>
        <person>Anton</person>
        <other>
            <person>Anton</person>
        </other>
    </working-time>
    <working-time>
        <person>Barbara</person>
        <other>
            <person>Barbara</person>
        </other>
    </working-time>
    <working-time>
        <person>Clara</person>
        <other>
            <person>Clara</person>
        </other>
    </working-time>
</result>

(The difference is in the position of the <working-time> element with the empty <person> and <other> child elements.)

The actual results also match the results of BaseX and Saxon.

Thus, I presume the test should be updated to match the actual results:

%test:assertEquals("<result><working-time><person/><other/></working-time><working-time><person>Anton</person><other><person>Anton</person></other></working-time><working-time><person>Barbara</person><other><person>Barbara</person></other></working-time><working-time><person>Clara</person><other><person>Clara</person></other></working-time></result>")

joewiz avatar Jun 08 '23 03:06 joewiz

@joewiz Thanks, I have now fixed that bad test. The rewrite of group-by that I had to do in this PR to pass the XQTS tests must have surfaced this.

This is not the first one of these I have found, I do wonder how people ended up adding tests in the past that don't comply with the XQuery specs!?!

adamretter avatar Jun 08 '23 18:06 adamretter

@adamretter Thanks! I think the test was introduced in this PR - before vs. after.

joewiz avatar Jun 08 '23 19:06 joewiz

@joewiz Ah yes, you are correct!

It has been an insane amount of work to fix so many broken things in eXist-db just so I could get to the point of being able to implement and test the Window expressions. I have been working on this for almost 1 year (since June 2022), and so there are a lot of things that I have fixed and forgotten about along the way ;-) Looking back at my notes, I see that I added this test to help me understand the non-spec compliant code of eXist-db before I refactored it; it seems I then forgot to update the test!

adamretter avatar Jun 08 '23 19:06 adamretter

@reinhapa Thanks for the review :-) I think I have addressed your concerns now, and so I have requested a further review...

adamretter avatar Jun 09 '23 09:06 adamretter

XQTS shows 638 fewer failures (an improvement of 14%) and 113 fewer errors (an improvement of 23%). See https://github.com/eXist-db/exist/actions/runs/5223111328/jobs/9429517572#step:16:25.

joewiz avatar Jun 09 '23 16:06 joewiz

@dizzzz @reinhapa I believe this is now ready to merge please

adamretter avatar Oct 14 '23 12:10 adamretter

SonarCloud Quality Gate failed.    Quality Gate failed

Bug C 5 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 47 Code Smells

70.0% 70.0% Coverage
3.4% 3.4% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

sonarqubecloud[bot] avatar Oct 14 '23 12:10 sonarqubecloud[bot]

@adamretter could you address some more of those Codacy/SonarCloud issues? There seem a lot of simple ones that can be solved easy...

reinhapa avatar Oct 16 '23 17:10 reinhapa

@reinhapa each of the 5 bugs listed by SonarCloud are false positives. I fixed a few small code smell issues highlighted by SonarCloud. The remaining suggestions from SonarCloud and Codacy otherwise IMHO make the code worse!

adamretter avatar Oct 29 '23 21:10 adamretter

@adamretter there is a compile error on the test source:

/D:/a/exist/exist/exist-core/src/main/java/org/exist/xquery/BasicExpressionVisitor.java:[34,8] org.exist.xquery.BasicExpressionVisitor is not abstract and does not override abstract method visitWindowExpression(org.exist.xquery.WindowExpr) in org.exist.xquery.ExpressionVisitor

reinhapa avatar Nov 20 '23 19:11 reinhapa