openCypher icon indicating copy to clipboard operation
openCypher copied to clipboard

Make OPTIONAL MATCH and WHERE interaction easier to understand

Open Mats-SX opened this issue 8 years ago • 11 comments

CIR-2017-191

The fact that WHERE does not filter rows when part of an OPTIONAL MATCH has confused many users. Semantically, OPTIONAL MATCH behaves like an outer join, and will only ever be additive (>= 0) to the cardinality of the preceding query. Consider the below example query, and note the comment:

MATCH (a)
OPTIONAL MATCH (a)-->(b)
WHERE false // does not filter, only makes b null
RETURN b

In contrast, moving the WHERE subclause to be attached to the MATCH clause, the query will return an empty result:

MATCH (a)
WHERE false // filters everything
OPTIONAL MATCH (a)-->(b)
RETURN b

It has not been clear to users that WHERE is intimately attached to its preceding clause, and that it only affects the cardinality of that clause, rather than the whole query (although this difference does not exist in the cases of WITH and MATCH). This CIR requests some way of making this more clear to the user.

Idea 1

Prohibit WHERE subclauses that do not depend on variables bound by its superclause. This would make the example queries in the description of this CIR invalid.

This idea could be 'soft', and allow predicate expressions that partially depend on the bound variables:

MATCH (a)
WHERE a.prop AND false
...

Or it could be 'strict', and only allow predicate expressions without constant parts.

Idea 2

Replace the OPTIONAL MATCH clause with a subquery clause; OPTIONAL. The first query from the description would be expressed like so:

MATCH (a)
OPTIONAL { // explicitly states the semantics of its sub-part
    MATCH (a)-->(b)
    WHERE false // normal filtering
    RETURN b
}
RETURN b

Ideas 1 and 2 are not mutually exclusive.

Mats-SX avatar Feb 27 '17 14:02 Mats-SX

+1 on this CIR. Although Idea 1 has the advantage of being far less drastic in terms of existing usage, I must say the synergy of Idea 2 and subqueries is very compelling indeed...

petraselmer avatar Feb 28 '17 10:02 petraselmer

+1 for Idea 2, I like the clarity this provides.

InverseFalcon avatar Mar 01 '17 00:03 InverseFalcon

I would add Idea 0: "Take no action" - as the semantic of WHERE tied to OPTIONAL MATCH is strict (and analogous to SQL outer join). Furthermore, everything might change after adding Regular Path Queries to Cypher (https://github.com/opencypher/openCypher/issues/179)

JanekPo avatar Mar 03 '17 12:03 JanekPo

I stumbled upon this problem recently and just found this issue. I ran a couple of experiments to better understand the semantics of OPTIONAL MATCH.

I used the following toy graph:

CREATE
  (:X {val: 1})-[:E1]->(:Y {val: 2})-[:E2]->(:Z {val: 3}),
  (:X {val: 4})-[:E1]->(:Y {val: 5}),
  (:X {val: 6})

Matching on a single edge is fairly simple:

MATCH (x:X)
OPTIONAL MATCH (x)-[:E1]->(y:Y)
WHERE x.val < y.val
RETURN x, y
╒═════════╤═════════╕
│"x"      │"y"      │
╞═════════╪═════════╡
│{"val":1}│{"val":2}│
├─────────┼─────────┤
│{"val":4}│{"val":5}│
├─────────┼─────────┤
│{"val":6}│null     │
└─────────┴─────────┘

plan 10

Things get more complicated with two edges, especially for the query plan, which contains an Apply and an Optional operator.

MATCH (x:X)
OPTIONAL MATCH (x)-[:E1]->(y:Y)-[:E2]->(z:Z)
WHERE x.val < z.val
RETURN x, y, z
╒═════════╤═════════╤═════════╕
│"x"      │"y"      │"z"      │
╞═════════╪═════════╪═════════╡
│{"val":1}│{"val":2}│{"val":3}│
├─────────┼─────────┼─────────┤
│{"val":4}│null     │null     │
├─────────┼─────────┼─────────┤
│{"val":6}│null     │null     │
└─────────┴─────────┴─────────┘

plan 11

Splitting the OPTIONAL MATCH into two clauses results in different semantics:

MATCH (x:X)
OPTIONAL MATCH (x)-[:E1]->(y:Y)
OPTIONAL MATCH (y)-[:E2]->(z:Z)
WHERE x.val < z.val
RETURN x, y, z
╒═════════╤═════════╤═════════╕
│"x"      │"y"      │"z"      │
╞═════════╪═════════╪═════════╡
│{"val":1}│{"val":2}│{"val":3}│
├─────────┼─────────┼─────────┤
│{"val":4}│{"val":5}│null     │
├─────────┼─────────┼─────────┤
│{"val":6}│null     │null     │
└─────────┴─────────┴─────────┘

plan 7

Note that the TCK currently does not fully cover this issue. The only that have two edges in an OPTIONAL MATCH are the following - note that none have a WHERE clause:

$ ag 'OPTIONAL MATCH.*\(.*\(.*\(' -C 2
MatchAcceptance2.feature
316-      """
317-      MATCH (a {name: 'A'})
318:      OPTIONAL MATCH (a)-[:KNOWS]->()-[:KNOWS]->(foo)
319-      RETURN foo
320-      """
--
354-      """
355-      MATCH (a {name: 'A'})
356:      OPTIONAL MATCH p = (a)-->(b)-[*]->(c)
357-      RETURN p
358-      """

OptionalMatchAcceptance.feature
258-      """
259-      MATCH (a:Single), (c:C)
260:      OPTIONAL MATCH (a)-->(b)-->(c)
261-      RETURN b
262-      """
--
270-      """
271-      MATCH (a:A), (c:C)
272:      OPTIONAL MATCH (a)-->(b)-->(c)
273-      RETURN b
274-      """

We are happy to write some additional TCK tests if you agree that they'd be useful. The queries above might provide a good starting point for that.

Update: I checked the SIGMOD 2018 Cypher paper on the matter:

image

This, as expected, captures the semantics of the language correctly, but is rather difficult (but not impossible) to translate to standard relational algebra.

szarnyasg avatar Oct 02 '18 14:10 szarnyasg

@szarnyasg Thanks for raising this, and please do write the additional TCK tests for this (much appreciated).

petraselmer avatar Oct 04 '18 09:10 petraselmer

@szarnyasg Interesting analysis, and yes we would of course be interested in such a TCK contribution (as Petra already mentions).

You write that the TCK 'does not fully cover this issue' -- I wonder to which issue you are referring? The general issue to which this CIR is dedicated (OPT MATCH + WHERE giving allegedly surprising behaviour), or another issue in particular? Are you perhaps looking for being able to express predicates covering multiple OPTIONAL MATCH clauses, such as in your last query?

Mats-SX avatar Oct 08 '18 10:10 Mats-SX

@Mats-SX I was referring to the OPTIONAL MATCH + WHERE constellation (which is the focus of this CIR). This code search reveals that there are a few tests for multiple OPTIONAL MATCH clauses:

$ ag 'OPTIONAL MATCH.*\n.*OPTIONAL MATCH' -C 2
MatchAcceptance2.feature
1646-      """
1647-      MATCH (a:A)
1648:      OPTIONAL MATCH (a)-[:FOO]->(b:B)
1649:      OPTIONAL MATCH (b)<-[:BAR*]-(c:B)
1650-      RETURN a, b, c
1651-      """

OptionalMatchAcceptance.feature
97-      """
98-      MATCH (a:Single)
99:      OPTIONAL MATCH (a)-->(b:NonExistent)
100:      OPTIONAL MATCH (a)-->(c:NonExistent)
101-      WITH coalesce(b, c) AS x
102-      MATCH (x)-->(d)
--
282-      """
283-      MATCH (a:A), (b:B)
284:      OPTIONAL MATCH (a)-->(x)
285:      OPTIONAL MATCH (x)-[r]->(b)
286-      RETURN x, r
287-      """
--
309-    When executing query:
310-      """
311:      OPTIONAL MATCH (a:NotThere)
312:      OPTIONAL MATCH (b:NotThere)
313-      WITH a, b
314-      OPTIONAL MATCH (b)-[r:NOR_THIS]->(a)
--
329-    When executing query:
330-      """
331:      OPTIONAL MATCH (f:DoesExist)
332:      OPTIONAL MATCH (n:DoesNotExist)
333-      RETURN collect(DISTINCT n.property) AS a, collect(DISTINCT f.property) AS b
334-      """

szarnyasg avatar Oct 08 '18 10:10 szarnyasg

So the "issue" is the low level of TCK coverage for the feature? Then I understand.

Mats-SX avatar Oct 08 '18 12:10 Mats-SX

I think our "issue" was that OPTIONAL MATCH + WHERE is unintuitive at first sight to many users, and the TCK does not cover some important sub-cases (such as OPTIONAL MATCH on two edges + WHERE). In the absence of these tests, implementations might get this wrong - which is exactly what happened to us when working on similar queries in ingraph.

szarnyasg avatar Oct 08 '18 12:10 szarnyasg

I have checked queries you had posted - the results are expectedly accordant with Cypher's semantic, because of match clause definitions corresponding to 1-4 formulas that you had pasted:

eval_clause(Match,Graph,BindingTable,MatchBindingTable,Graph) :-
Match = match(MatchModifier,Pattern,Where),
findall(MatchBindingRow,
        (
         member(BindingRow,BindingTable),
         eval_match_(MatchModifier,Graph,Pattern,Where,BindingRow,MatchBindingRow)
        ),
        MatchBindingTable).

eval_match_(no_modifier,Graph,Pattern,Where,BindingRow,MatchBindingRow)
:-
eval_match(BindingRow,Graph,Pattern,MatchBindingRow),
eval_where(Graph,Where,MatchBindingRow).

eval_match_(optional,Graph,Pattern,Where,BindingRow,MatchBindingRow)
:-
findall(MatchBindingRow_,
(
eval_match(BindingRow,Graph,Pattern,MatchBindingRow_),
eval_where(Graph,Where,MatchBindingRow_)
),
MatchBindingRows_),
MatchBindingRows_ = [_|_],
!,
member(MatchBindingRow,MatchBindingRows_).


eval_match_(optional,_Graph,Pattern,_Where,BindingRow,MatchBindingRow)
:-
get_names(Pattern,Names),
list_to_set(Names,NamesSet),
maplist([Name,bind(Name,cypher_null)]>>true,NamesSet,Binds),
extend_binding_row(BindingRow,Binds,MatchBindingRow).

If you wish you can verify your respective query results in Cypher.PL Please be aware that - due to ambiguity handling - Cypher.PL console is currently running in stateless mode, it means that graph state is not preserved between query execution. Thus you have to create graph state in the same query e.g.

CREATE //graph state creation
  (:X {val: 1})-[:E1]->(:Y {val: 2})-[:E2]->(:Z {val: 3}),
  (:X {val: 4})-[:E1]->(:Y {val: 5}),
  (:X {val: 6})
with count(*) as unused_variable_
MATCH (x:X) //tested query
OPTIONAL MATCH (x)-[:E1]->(y:Y)
WHERE x.val < y.val
RETURN x, y

Cypher.PL console is available at: ssh [email protected] #password: cypherpl In case of any incompatibilities with your results please let me know.

JanekPo avatar Oct 15 '18 17:10 JanekPo

Thanks, @JanekPo for investigating this with your implementation. Neo4j returns the correct results indeed, but it is worth trying to make the interaction of OPTIONAL MATCH and WHERE more intuitive (and thus making the language more accessible). Simultaneously, the TCK should document such complex interactions of language constructs, which I tried to address with #333.

szarnyasg avatar Oct 21 '18 15:10 szarnyasg