arcadedb icon indicating copy to clipboard operation
arcadedb copied to clipboard

[CYPHER] MATCH + optional match (when >1 match in optional) stops at first match iteration

Open camomiy opened this issue 11 months ago • 7 comments

Hello

ArcadeDB Version:

ArcadeDB Server v24.11.2 (build 055592c73d27d894c26f3faaf7df22e15c28f03d/1733838531445/main)

OS and JDK Version:

Running on Linux 5.15.0-105-generic - OpenJDK 64-Bit Server VM 17.0.13

Expected behavior

Here is a part of our graph :

match(n) 
Where n:DOCUMENT or n:CHUNK or n.subtype='LOCATION' 
return n

Image

Long story short, with the visible chunks on this screenshot, we should be able using this query to find multiple DOCUMENT nodes.

MATCH (doc:DOCUMENT)
OPTIONAL MATCH (ner:NER {subtype: 'LOCATION'})<--(linkedChunk:CHUNK)-->(doc:DOCUMENT)
RETURN ID(doc), ID(ner)


Simple because in the screenshot, one NER is linked to two CHUNK nodes that are themselves linked to their own DOCUMENT node.

However, running this query only returns this table :

ID(doc)	ID(ner)
#25:0	#82:0
#25:0	#85:0
#25:0	#88:0
#25:0	#91:0

Removing {subtype: 'LOCATION'} filter from the query increases the results length but it still stops at the first found DOCUMENT.

I understand it can be confusing, but as a TL;DR, here is an example that there is something wrong going on :


MATCH (doc:DOCUMENT)
OPTIONAL MATCH (ner:NER {subtype: 'LOCATION'})<--(linkedChunk:CHUNK)-->(doc:DOCUMENT)
RETURN ID(doc)

(should match real things, all docs, it's an OPTIONAL match so even if it doesn't find anything, it should at least return as much as data as the MATCH (non optional) query alone)



ID(doc)
#25:0
#25:0
#25:0
#25:0

Yes, lines are duplicated because there are 4 NER nodes matching.


MATCH (doc:DOCUMENT)
OPTIONAL MATCH (:RANDOM_DUMMY_NON_EXISTING_TYPE {subtype: 'LOCATION'})<--(linkedChunk:CHUNK)-->(doc:DOCUMENT)
RETURN ID(doc)

ID(doc)
#25:0
#28:0
#31:0

TL;DR :

Even if there is no match on the OPTIONAL MATCH query, it should return a null value in the second column and still returns all the DOCUMENT nodes.

Here is the database backup dump :

POLAIRE2-backup-20250123-094141512.zip

camomiy avatar Jan 23 '25 10:01 camomiy

Feel free to ping me if you need any detail, we created this issue at the office

ExtReMLapin avatar Jan 24 '25 13:01 ExtReMLapin

I assigned this issue to the next 25.2.1 release.
Right now we are consolidating the work streams for 25.1.1. The best way to help us in the triage process would be to provide a test in Java through a PR. That said, if you cant't do that, the sample database and the queries are gold for us.

robfrank avatar Jan 24 '25 14:01 robfrank

Another query that should make things clearer, results from a query to another should be the same :

MATCH (doc:DOCUMENT)
OPTIONAL MATCH (ner {subtype: 'LOCATION'})<--(linkedChunk:CHUNK)-->(doc:DOCUMENT)
RETURN distinct(ID(doc)), head(LABELS(ner))
ArcadeDB - The Next Generation Multi-Model DBMS

(ID(doc))	head(LABELS(ner))
#25:0	NER
#28:0	NER
#31:0	NER


MATCH (doc:DOCUMENT)
OPTIONAL MATCH (ner:NER {subtype: 'LOCATION'})<--(linkedChunk:CHUNK)-->(doc:DOCUMENT)
RETURN distinct(ID(doc)), head(LABELS(ner))
ArcadeDB - The Next Generation Multi-Model DBMS

(ID(doc))	head(LABELS(ner))
#25:0	NER

ExtReMLapin avatar Mar 12 '25 08:03 ExtReMLapin

MATCH (doc:DOCUMENT)
OPTIONAL MATCH (ner:NER {subtype: 'LOCATION'})
RETURN distinct(ID(doc))

Is currently translated to

g.V().as('doc').hasLabel('DOCUMENT').choose(__.V().hasLabel('NER').has('subtype', eq('LOCATION')), __.V().hasLabel('NER').has('subtype', eq('LOCATION')), __.constant('  cypher.null')).select('doc').project('(ID(doc))').by(__.choose(neq('  cypher.null'), __.id())).dedup()

but it should be translated to

g.V().hasLabel('DOCUMENT').as('doc')
    .optional(__.V().hasLabel('NER').has('subtype', 'LOCATION'))
    .select('doc')
    .id()
    .dedup()

I really can't manage to compile correctly https://github.com/opencypher/cypher-for-gremlin even after installing java 8 jdk, i get jars, but for some reasons it compiles to 1.0.2 snapshot instead of 1.0.4 and when I edit the .xml of ArcadeDB I get compilation warning telling me I got missing symbols like (but not only)TypeException from import org.opencypher.gremlin.translation.exception.TypeException; and Tokens from org.opencypher.gremlin.translation.Tokens;

If any of you have any experience with building this project by hand I would be really glad to get your help @gramian @lvca

Edit : branch to build is https://github.com/ExtReMLapin/cypher-for-gremlin/tree/fix_optional_match

ExtReMLapin avatar Mar 18 '25 16:03 ExtReMLapin

@robfrank can you help on this?

lvca avatar Mar 18 '25 20:03 lvca

Also, yes on the branch I posted, tests are not passing but that's not a big surprise considering how it fails :

So i just build it using assemble instead of build

(Some of the failing tests)

matchAndReverseOptionalMatch

java.lang.AssertionError: [Extracted: a1.name, r.name, b2.name] 
Actual and expected should have same size but actual size was:
  <0>
while expected size was:
  <1>
Actual was:
  <[]>
Expected was:
  <[("A", "T", null)]>

    @Test
    @Category(SkipWithCosmosDB.Truncate4096.class)
    public void matchAndReverseOptionalMatch() throws Exception {
        submitAndGet("CREATE (:A {name: 'A'})-[:T {name: 'T'}]->(:B {name: 'B'})");
        List<Map<String, Object>> results = submitAndGet(
            "MATCH (a1)-[r]->() " +
                "WITH r, a1 " +
                "OPTIONAL MATCH (a1)<-[r]-(b2) " +
                "RETURN a1.name, r.name, b2.name"
        );

        assertThat(results)
            .extracting("a1.name", "r.name", "b2.name")
            .containsExactly(tuple("A", "T", null));
    }

optionalMatchOnEmptyGraph


java.lang.AssertionError: [Extracted: n] 
Actual and expected should have same size but actual size was:
  <0>
while expected size was:
  <1>
Actual was:
  <[]>
Expected was:
  <[null]>

    @Test
    public void optionalMatchOnEmptyGraph() throws Exception {
        List<Map<String, Object>> results = submitAndGet(
            "OPTIONAL MATCH (n) " +
                "RETURN n"
        );

        assertThat(results)
            .extracting("n")
            .containsExactly((Object) null);
    }

optionalStartEndNode

java.lang.AssertionError: [Extracted: a, b] 
Expecting:
  <[]>
to contain exactly in any order:
  <[(null, null)]>
but could not find the following elements:
  <[(null, null)]>

    @Test
    public void optionalStartEndNode() {
        List<Map<String, Object>> results = submitAndGet(
            "OPTIONAL MATCH ()-[r:notExisting]-()\n" +
                "RETURN startNode(r) as a, endNode(r) as b");

        assertThat(results)
            .extracting("a", "b")
            .containsExactlyInAnyOrder(tuple(null, null));
    }

I don't want to sound like the greenhorn that just started using a computer and claims there is a bug instead of taking the time to considering he's the one doing something wrong but considering the bug of this issue, the proposed rearanged gremlin query and the failing tests, I'm starting to believe they wrote those tests after writing the code generating the query so they have tests exactly matching the incorrectly working code.

ExtReMLapin avatar Mar 19 '25 09:03 ExtReMLapin

It's possible. The fact that the project has been abandoned for years and there is nobody to carry on with makes any issue hard to fix. I'd suggest using ArcadeDB SQL or Gremlin which at least is still active. The best would be a native Cypher parser + executor, but we lack the resources can work on that + the new GQL which should take over cypher.

lvca avatar Mar 20 '25 02:03 lvca