jena icon indicating copy to clipboard operation
jena copied to clipboard

Interaction of GRAPH graph patterns and subqueries

Open nkaralis opened this issue 1 year ago • 4 comments

Version

5.2.0

Question

Hello,

I have some questions about the interaction of GRAPH graph patterns and subquries.

I am using version 5.2.0.

Assume the scenario described below.

First, I load a graph into two separate named graphs.

LOAD <https://raw.githubusercontent.com/w3c/rdf-tests/refs/heads/main/sparql/sparql11/functions/data.ttl> INTO GRAPH <http://www.example.org/graph1> ;
LOAD <https://raw.githubusercontent.com/w3c/rdf-tests/refs/heads/main/sparql/sparql11/functions/data.ttl> INTO GRAPH <http://www.example.org/graph2>

Both graphs contain 16 triples.

The query provided below, returns the triples found in both graphs, which results in 32 solutions. Here, ?g is always unbound.

SELECT * WHERE {
    GRAPH ?g { 
        {
            SELECT ?s ?p ?o  WHERE {
                ?s ?p ?o
            }
        }
    }
}

The query provided below also returns 32 results. In this case, ?g is always assigned a value (i.e., <http://www.example.org/graph1> or <http://www.example.org/graph2>)

SELECT * WHERE {
    GRAPH ?g { 
        {
            SELECT * WHERE {
                ?s ?p ?o
            }
        }
    }
}

I have the following questions:

  • First, why do these queries return different results?
  • Second, why does the second query return 32 results?

For both queries, I was expecting 64 results: Cartesian product between the results of the subqueries (32 results) and the possbible values for ?g (2 named graphs).

Thank you in advance.

nkaralis avatar Oct 24 '24 12:10 nkaralis

Can you provide details of what your storage setup is e.g.

  • Is this TDB2 or an in-memory dataset?
  • Do you have unionDefaultGraph enabled by any chance?
  • Providing the Fuseki config file if using Fuseki would be helpful

In algebra terms these end up being different algebra's which likely explains the difference in results.

Your first query yields the following algebra:

(base <http://example/base/>
  (project (?s ?p ?o)
    (quadpattern (quad ?g ?s ?p ?o))))

While your second yields the following algebra:

(base <http://example/base/>
  (quadpattern (quad ?g ?s ?p ?o)))

Notice that with the SELECT * in the inner query the project step is omitted from the generated algebra so ?g is always unbound. However, I'm not sure if this is the correct behaviour here, probably a question for @afs to answer


For both queries, I was expecting 64 results: Cartesian product between the results of the subqueries (32 results) and the possbible values for ?g (2 named graphs).

That shouldn't ever be the case, the way a GRAPH ?g clause is logically defined is that the inner pattern is executed independently for each graph in the dataset and the results are union'd together with the . So each graph independently yields 16 results and these union together to yield 32 results.

rvesse avatar Oct 24 '24 13:10 rvesse

I am using fuseki with TDB2

# for starting the server
java -jar fuseki-server.jar --update --tdb2 --loc=databases/testing /endpoint

I am using the default config file found in apache-fuseki-5.2.0/run

# Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0

## Fuseki Server configuration file.

@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .

[] rdf:type fuseki:Server ;
   # Example::
   # Server-wide query timeout.   
   # 
   # Timeout - server-wide default: milliseconds.
   # Format 1: "1000" -- 1 second timeout
   # Format 2: "10000,60000" -- 10s timeout to first result, 
   #                            then 60s timeout for the rest of query.
   #
   # See javadoc for ARQ.queryTimeout for details.
   # This can also be set on a per dataset basis in the dataset assembler.
   #
   # ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "30000" ] ;

   # Add any custom classes you want to load.
   # Must have a "public static void init()" method.
   # ja:loadClass "your.code.Class" ;   

   # End triples.
   .

That shouldn't ever be the case, the way a GRAPH ?g clause is logically defined is that the inner pattern is executed independently for each graph in the dataset and the results are union'd together with the . So each graph independently yields 16 results and these union together to yield 32 results

I see. It makes sense, thank you

nkaralis avatar Oct 24 '24 13:10 nkaralis

@nkaralis -- thank for the report. The unbound ?g is a bug.

@rvesse's analysis is correct (a simpler reproduction below). TDB is the main user of quad-based execution but its available for in-memory as well:

## ==> Q.rq <==
SELECT * {
    GRAPH ?g { 
            SELECT ?s  { ?s ?p ?o }
    }
}
## ==> D.trig <==
PREFIX : <http://example/>

GRAPH :g1 { :s :p :o }

and

  sparql --optimize=false --engine=quad --data D.trig --query Q.rq

giving

--------------------------
| s                  | g |
==========================
| <http://example/s> |   |
--------------------------

afs avatar Oct 27 '24 10:10 afs

Hello.

I have been trying out some additional queries, including queries with nested GraphGraphPatterns, and have encountered some inconsistencies between fuseki and arq (both in version 5.2.0).

The queries q.rq and q2.rq are executed against the dataset data.trig.

# data.trig
@prefix : <http://example.org/>

:graph1 { :s1 :p1 :o1 . 
          :s1 :p2 :o2 . 
          :s1 :p3 :o3 . }
:graph2 { :s1 :p1 :o1 .
          :s1 :p2 :o2 . }
# q.rq
SELECT  * WHERE { 
    GRAPH ?g1
    { 
        { 
            SELECT  * WHERE { 
                GRAPH ?g { ?s  ?p  ?o } 
            }
        }
    }
}
# q2.rq
SELECT * WHERE { 
    GRAPH ?g { VALUES ?s { <http://example.org/s1> } } 
}

Results for q.rq

fuseki (note: `g1` is unbound)

s,p,o,g,g1
http://example.org/s1,http://example.org/p1,http://example.org/o1,http://example.org/graph1,
http://example.org/s1,http://example.org/p1,http://example.org/o1,http://example.org/graph2,
http://example.org/s1,http://example.org/p2,http://example.org/o2,http://example.org/graph1,
http://example.org/s1,http://example.org/p2,http://example.org/o2,http://example.org/graph2,
http://example.org/s1,http://example.org/p3,http://example.org/o3,http://example.org/graph1,
./arq --explain --query=q.rq --data=data.trig 
14:30:43 INFO  exec            :: QUERY
  SELECT  *
  WHERE
    { GRAPH ?g1
        { { SELECT  *
            WHERE
              { GRAPH ?g
                  { ?s  ?p  ?o }
              }
          }
        }
    }
14:30:43 INFO  exec            :: ALGEBRA
  (graph ?g1
    (graph ?g
      (bgp (triple ?s ?p ?o))))
14:30:43 INFO  exec            :: BGP ::   ?s ?p ?o
14:30:43 INFO  exec            :: Reorder/generic ::   ?s ?p ?o
14:30:43 INFO  exec            :: BGP ::   ?s ?p ?o
14:30:43 INFO  exec            :: Reorder/generic ::   ?s ?p ?o
14:30:43 INFO  exec            :: BGP ::   ?s ?p ?o
14:30:43 INFO  exec            :: Reorder/generic ::   ?s ?p ?o
14:30:43 INFO  exec            :: BGP ::   ?s ?p ?o
14:30:43 INFO  exec            :: Reorder/generic ::   ?s ?p ?o
-------------------------------------------------------------------------------------------------------------------------------------------
| s                       | p                       | o                       | g                           | g1                          |
===========================================================================================================================================
| <http://example.org/s1> | <http://example.org/p1> | <http://example.org/o1> | <http://example.org/graph1> | <http://example.org/graph1> |
| <http://example.org/s1> | <http://example.org/p2> | <http://example.org/o2> | <http://example.org/graph1> | <http://example.org/graph1> |
| <http://example.org/s1> | <http://example.org/p3> | <http://example.org/o3> | <http://example.org/graph1> | <http://example.org/graph1> |
| <http://example.org/s1> | <http://example.org/p1> | <http://example.org/o1> | <http://example.org/graph2> | <http://example.org/graph1> |
| <http://example.org/s1> | <http://example.org/p2> | <http://example.org/o2> | <http://example.org/graph2> | <http://example.org/graph1> |
| <http://example.org/s1> | <http://example.org/p1> | <http://example.org/o1> | <http://example.org/graph1> | <http://example.org/graph2> |
| <http://example.org/s1> | <http://example.org/p2> | <http://example.org/o2> | <http://example.org/graph1> | <http://example.org/graph2> |
| <http://example.org/s1> | <http://example.org/p3> | <http://example.org/o3> | <http://example.org/graph1> | <http://example.org/graph2> |
| <http://example.org/s1> | <http://example.org/p1> | <http://example.org/o1> | <http://example.org/graph2> | <http://example.org/graph2> |
| <http://example.org/s1> | <http://example.org/p2> | <http://example.org/o2> | <http://example.org/graph2> | <http://example.org/graph2> |
-------------------------------------------------------------------------------------------------------------------------------------------

Results for q2.rq

fuseki

s,g
http://example.org/s1,
./arq --explain --query=q2.rq --data=data.trig
14:53:59 INFO  exec            :: QUERY
  SELECT  *
  WHERE
    { GRAPH ?g
        { VALUES ?s { <http://example.org/s1> } }
    }
14:53:59 INFO  exec            :: ALGEBRA
  (graph ?g
    (table (vars ?s)
      (row [?s <http://example.org/s1>])
    ))
---------------------------------------------------------
| s                       | g                           |
=========================================================
| <http://example.org/s1> | <http://example.org/graph1> |
| <http://example.org/s1> | <http://example.org/graph2> |
---------------------------------------------------------

Based on the previous responses, the results returned from arq are correct for both queries, right?

nkaralis avatar Dec 11 '24 14:12 nkaralis