exist icon indicating copy to clipboard operation
exist copied to clipboard

[BUG] NPE at org.exist.xquery.modules.range.RangeQueryRewriter.rewriteLocationStep

Open joewiz opened this issue 3 years ago • 7 comments

Describe the bug

In eXist develop HEAD (5.3.0-SNAPSHOT), a new NPE is raised when running a query. The same query did not raise an NPE in the current stable release, 5.2.0.

Expected behavior

The query should not raise an NPE.

To Reproduce

To reproduce, install https://github.com/HistoryAtState/release (here's a prebuilt xar: release-0.5.0.xar.zip - just remove the .zip extension), and open http://localhost:8080/exist/apps/release/ebook-batch.xq. The client sees the following 500 error:

HTTP ERROR 500 javax.servlet.ServletException: javax.servlet.ServletException: An error occurred while processing request to /exist/apps/release/ebook-batch.xq: An error occurred: null

Attached is the corresponding error from exist.log: exist-log-npe-rangequeryrewriter.txt

An excerpt:

2020-11-20 17:13:58,622 [qtp558431044-99] ERROR (EXistServlet.java [doGet]:291) - null 
java.lang.NullPointerException: null
	at org.exist.xquery.modules.range.RangeQueryRewriter.rewriteLocationStep(RangeQueryRewriter.java:109) ~[exist-index-range-5.3.0-SNAPSHOT.jar:5.3.0-SNAPSHOT]
	at org.exist.xquery.Optimizer.visitLocationStep(Optimizer.java:83) ~[exist-core-5.3.0-SNAPSHOT.jar:5.3.0-SNAPSHOT]
	at org.exist.xquery.LocationStep.accept(LocationStep.java:1242) ~[exist-core-5.3.0-SNAPSHOT.jar:5.3.0-SNAPSHOT]
	at org.exist.xquery.DynamicCardinalityCheck.accept(DynamicCardinalityCheck.java:139) ~[exist-core-5.3.0-SNAPSHOT.jar:5.3.0-SNAPSHOT]
	at org.exist.xquery.DefaultExpressionVisitor.visitBuiltinFunction(DefaultExpressionVisitor.java:48) ~[exist-core-5.3.0-SNAPSHOT.jar:5.3.0-SNAPSHOT]
	at org.exist.xquery.Function.accept(Function.java:526) ~[exist-core-5.3.0-SNAPSHOT.jar:5.3.0-SNAPSHOT]
	at org.exist.xquery.InternalFunctionCall.accept(InternalFunctionCall.java:271) ~[exist-core-5.3.0-SNAPSHOT.jar:5.3.0-SNAPSHOT]

Two clues:

  1. Performing the same steps from eXist 5.2.0 (stable) does not yield the error, but instead shows the expected HTML webpage. This suggests a regression in eXist 5.3.0-SNAPSHOT.

  2. Also, commenting out https://github.com/HistoryAtState/release/blob/master/ebook-batch.xq#L198 allows eXist 5.3.0-SNAPSHOT to sidestep the error, so something in that code path must be triggering the NPE. I wasn't able to trace the source in my code, but I don't believe my code should be able to cause an NPE.

Context (please always complete the following information):

  • OS: macOS 11.0.1
  • eXist-db version: eXist 5.3.0-SNAPSHOT f65967548a9cc5886360cac80bc5cb1959d561be 20201119071106
  • Java Version: OpenJDK 1.8.0_275 (Liberica full)

Additional context

  • How is eXist-db installed? built from source
  • Any custom changes in e.g. conf.xml? none

joewiz avatar Nov 20 '20 22:11 joewiz

That's very interesting! So this happens when trying to rewrite the query to replace a Location Step with a function which performs a Range Index comparison.

So the only way you could have null there is if the Inner Expression is not either:

  1. A General Comparison, e.g. = !=, >=, eq, ne, ge, etc...
  2. The function: matches.

The code is problematic, as it returns null in those cases, but then the code calling it does not expect/handle null and so you get an NPE.

We could easily fix the NPE itself... But, as you describe not seeing this issue before, this however is possibly not the root cause. We would need to know what expression is causing the NPE to determine the root cause.

The first step would probably be to add some trace level logging into Optimizer#visitLocationStep(LocationStep) so that we can see what it is attempting to optimize.

adamretter avatar Nov 23 '20 10:11 adamretter

@adamretter Thanks for your analysis and suggestion about how to proceed with determining the root cause. I'll let you know what we find.

joewiz avatar Dec 04 '20 05:12 joewiz

So my PR is not ok @adamretter @wolfgangmm ? shall I withdraw it ? :-(

dizzzz avatar Jan 03 '21 19:01 dizzzz

I have tried to reproduce this issue in eXist-db 6.2.0, however I am not seeing any NPE. I will try try again with other versions of eXist-db

@joewiz If can't reproduce the issue in the other versions as well, should I try to figure out why it was failing or can we mark it as done?

marmoure avatar Mar 06 '23 14:03 marmoure

I tried to reproduce the issue against the HEAD of the develop branch (current commit: ad4bf8efc3b58bdd00431761dd796a5ae7913a75)

I received the following error:

<exception>
    <path>/db/apps/release/ebook-batch.xq</path>
    <message>
           err:XPST0081 error found while loading module epub: Error while loading module modules/epub.xql: Invalid qname hc:send-request. No namespace defined for prefix hc. QName is invalid: INVALID_PREFIX [at line 104, column 30] 
    </message>
</exception>

I am assuming this has nothing to do with original problem since I got it working on version 6.2.0.

marmoure avatar Mar 06 '23 14:03 marmoure

I have tried to reproduce this issue in eXist-db 6.2.0, however I am not seeing any NPE.

@marmoure So the PR (https://github.com/eXist-db/exist/pull/3625) that @dizzzz sent and was merged (see the history above) will stop that NPE from appearing.

My feeling is that whilst the NPE no longer stops processing, there may be a different underlying bug that it is hiding. If you were to take a version of eXist-db 6.2.0, and then comment out the NPE protection (that @dizzzz added in https://github.com/eXist-db/exist/pull/3625), then you may be able to reproduce the issue that Joe reported. Ideally being able to reproduce that would show us the underlying bug, and we could discuss how to address that.

I am assuming this has nothing to do with original problem since I got it working on version 6.2.0.

@marmoure You are correct, that error you are seeing appears to be because ebook-batch.xq has not yet been updated to work with eXist-db 7.x.x.

adamretter avatar Mar 06 '23 17:03 adamretter

Starting from eXist-db 6.2.0 we locally undid the changes in the PR #3625 so that we can trace the route cause of the issue. After undoing #3625 we are again able to see the NPE.

As this looks like a potential regression in eXist-db 6.2.0 when compared to eXist-db 5.2.0, I started comparing the behaviour of 6.2.0 against 5.2.0 with the Java debugger to try and find out where their behaviour diverges in the code.

I found out that eXist-db 6.2.0 is attempting to optimise the query by rewriting a General Comparison Expression to instead use a Range Index, whereas eXist-db 5.2.0 is not rewriting the same query to use the Range Index.

The difference in behaviour was due to the query being rewritten with a range:eq function call in eXist-db 6.2.0.

In eXist-db 5.2.0 the query is:

self::tei:p[attribute::{}rend = "sectiontitlebold"]

In exist-db 6.2.0 the same query is rewritten to:

self::tei:p[range:eq(attribute::{}rend, "sectiontitlebold")]

This query rewriting is done in eXist-db 6.2.0 by the Optimizer (specifically the RangeQueryRewriter) during the analysis phase of query compilation.

So far we think it's likely due to an error in the handling of the query.

You can see the difference in that specific query by setting a breakpoint in both versions in this line. https://github.com/eXist-db/exist/blob/c8fa4958b6d4a50bd0cba7f3e76a150226414187/exist-core/src/main/java/org/exist/xquery/Optimizer.java#L81 with this specific condition: locationStep.toString().equals("self::tei:p[attribute::{}rend = \"sectiontitlebold\"]")

You'll see:

not(self::tei:p[attribute::{}rend = "sectiontitlebold"]) => exist_6.2.0
self::tei:p[attribute::{}rend = "sectiontitlebold"] => exist_5.x.x

This may mean that we have a bug that has been introduced since eXist-db 5.2.0 in the Optimizer (or that effects the Optimizer).

One observation that we made, is that when we checked the Optimizer function  https://github.com/eXist-db/exist/blob/c8fa4958b6d4a50bd0cba7f3e76a150226414187/extensions/indexes/range/src/main/java/org/exist/xquery/modules/range/RangeQueryRewriter.java#L44

There are no checks in place to make sure there is even a Range Index defined when trying to rewrite the query to optimize for a Range Index. This is not what we expected, and so this is a second potential bug or an opportunity for improvements.

marmoure avatar Mar 14 '23 14:03 marmoure