openCypher icon indicating copy to clipboard operation
openCypher copied to clipboard

Short-circuit evaluation in openCypher

Open ExtReMLapin opened this issue 1 year ago • 1 comments

Hello, I'm using openCypher in ArcadeDB.

I met a bug that was first in the developer in a hotfix really quickly, but we should have never met this bug, it was isLower running on null fields.

What's weird is the first evaluation condition in the query should never have let isLower run on a node type that didn't have this property.

The same way that running this query something like 10 times (before applying the hotfix) didn't throw an error 10 times out of 10.

This is what makes me think that there was an issue in the order of the execution/validation of the conditions in the query.

Here is the issue on the other repo : https://github.com/ArcadeData/arcadedb/issues/1563

Here was the query :

MATCH (n) WHERE (n:NER OR n:THEME) AND
(
    toLower(n.identity) CONTAINS 'technology' AND
    toLower(n.identity) CONTAINS 'science' AND
    toLower(n.identity) CONTAINS 'notable advancements' AND
    toLower(n.identity) CONTAINS '2020'
)
RETURN n

The first condition n:NER OR n:THEME ensures the query runs ONLY on nodes where the identity property exists. Yet it managed to error (not every time).

As it doesn't always error it sounds like the conditions are stored in an unordered set and the order is not respected.

ExtReMLapin avatar Apr 24 '24 05:04 ExtReMLapin

Hi,

this is not an openCypher question. openCypher publishes a specification of the language. Your problem seems to be with the implementation of that specification, so you should contact your database provider (as I see you have done).

In general, database languages do not specify the order of evaluation of filtering conditions. This allows implementers to create query planners that optimize the order of evaluation based on factors like selectivity of the conditions and/or the presence or not of indexes on the properties used.

This is the case for openCypher (and for SQL too), afaik.

I am a bit surprised, too, that the order changes in consecutive runs, but again, that is something to discuss with your database provider.

Knowing nothing about the database you are using, I would suggest to split the match into two: MATCH (n) WHERE (n:NER OR n:THEME) MATCH (n) WHERE toLower(n.identity) CONTAINS 'technology' AND toLower(n.identity) CONTAINS 'science' AND toLower(n.identity) CONTAINS 'notable advancements' AND toLower(n.identity) CONTAINS '2020' RETURN n

that should guarantee that the first match conditions are evaluated first.

malencv avatar May 09 '24 13:05 malencv