virtuoso-opensource icon indicating copy to clipboard operation
virtuoso-opensource copied to clipboard

non-deterministic and incomplete property path ( rdfs:subClassOf ) query result counts

Open JJ-Author opened this issue 7 years ago • 11 comments

Hi, we experienced a questionable behavior for property path queries, returning non-deterministic or incomplete query result counts for the following types of queries:

We tried the query below for the English DBpedia Endpoint: Query A

Prefix dbo: <http://dbpedia.org/ontology/>
SELECT (Count( distinct ?s) as ?number)
WHERE {
    ?s rdf:type/(rdfs:subClassOf{0,2}) dbo:Place .
    ?s rdf:type/(rdfs:subClassOf{0,2}) dbo:Agent .
}  

Problem 1 (non-determinism): Rewriting the query (inserting a comment at a random place) and rerunning the query leads to a different count number.

Problem 2: (incomplete results) When running query A on our own virtuoso backend BUT without loaded DBpedia ontology (so there will be no single subClassOf triple in the store), the result returned 0, but it has to be much higher since we investigated it with Query B (similar to A but without the optional (!) property path)

Query B

Prefix dbo: <http://dbpedia.org/ontology/>
SELECT (Count( distinct ?s) as ?number)
WHERE {
    ?s rdf:type dbo:Place .
    ?s rdf:type dbo:Agent .
}  

returns a number higher than 0 (so that means Place and Agent are not disjoint) which should at least be the number of results we expect to be returned from Query A in our store

JJ-Author avatar Jan 04 '18 16:01 JJ-Author

For Query A run against the http://dbpedia.org/sparql endpoint we host, this is a feature of the Virtuoso Anytime query feature detailed at —

http://docs.openlinksw.com/virtuoso/anytimequeries/

— which restricts the time a query can be run against the public SPARQL endpoint returning a partial result set as can be seen from the following curl output:

$  curl -I 'http://dbpedia.org/sparql/?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=Prefix+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0D%0ASELECT+%28Count%28+distinct+%3Fs%29+as+%3Fnumber%29%0D%0AWHERE+%7B%0D%0A++++%3Fs+rdf%3Atype%2F%28rdfs%3AsubClassOf%7B0%2C2%7D%29+dbo%3APlace+.%0D%0A++++%3Fs+rdf%3Atype%2F%28rdfs%3AsubClassOf%7B0%2C2%7D%29+dbo%3AAgent+.%0D%0A%7D++%0D%0A&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000&debug=on&run=+Run+Query+++'
HTTP/1.1 200 OK
Date: Fri, 05 Jan 2018 10:35:50 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 121
Connection: keep-alive
Server: Virtuoso/07.20.3224 (Linux) i686-generic-linux-glibc212-64  VDB
X-SPARQL-default-graph: http://dbpedia.org
X-SQL-State: S1TAT
X-SQL-Message: RC...: Returning incomplete results, query interrupted by result timeout.  Activity:  2.295M rnd  10.05M seq  2.029M same seg   219.2K same pg  25.12K same par      1 disk      0 spec disk      0B /      0
X-Exec-Milliseconds: 30549
X-Exec-DB-Activity: 2.295M rnd  10.05M seq  2.029M same seg   219.2K same pg  25.12K same par      1 disk      0 spec disk      0B /      0 messages      0 fork
Expires: Fri, 12 Jan 2018 10:35:50 GMT
Cache-Control: max-age=604800
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: HEAD, GET, POST, OPTIONS
Access-Control-Allow-Headers: DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Accept-Encoding
Accept-Ranges: bytes
$

Note the X-SQL-Message: RC...: Returning incomplete results, query interrupted by result timeout. output which indicates the anytime timeout has been encountered resulting in the incomplete result being returned.

Thus applications need to handle this condition or you can set up your own local DBpedia instance with no Anytime timeout query restrictions.

The result set is the same when the query is re-executed without the comment or change to it as there is an nginx cache in front of the real endpoint which has the result cached, and this returns the value immediately from the the cache.

I don't understand your Problem B... I presume you have some part of the DBpedia datasets loaded locally but not sure why you indicate the Dbpedia ontology is not loaded. Are you able to demonstrate the issue against our public Dbpedia endpoint such that the problem can be seen first hand? Or note we have a Amazon AWS DBpedia AMI which is a replica of the public endpoint we host (but with no nginx cache) that can be instantiated as detailed at https://aws.amazon.com/marketplace/pp/B012DSCFEK ...

HughWilliams avatar Jan 05 '18 12:01 HughWilliams

Hi Hugh, thanks a lot for your fast reply. I did not know about that anytime query feature, so I'm sorry for that. This is just my personal opinion but it might be useful to show a message or warning about incomplete results in the HTML view or even as comments in RDF.

To come back to question 2: I prepared a little example.

  1. Load the RDF resource http://dbpedia.org/resource/Library_of_Parliament into the graph http://property.path.test.single/
  2. now run

Query P1

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT distinct ?s 
WHERE {
GRAPH <http://property.path.test.single/> {
    ?s rdf:type dbo:Place .
    ?s rdf:type dbo:Agent .
 }
} 
Limit 100

Query P2:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT distinct ?s 
WHERE {
GRAPH <http://property.path.test.single/> {
    ?s rdf:type/rdfs:subClassOf* dbo:Place .
    ?s rdf:type/rdfs:subClassOf* dbo:Agent .
 }
} 
Limit 100

Query P3:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT distinct ?s 
WHERE {
GRAPH <http://property.path.test.single/> {
    ?s rdf:type/(rdfs:subClassOf{0,2})  dbo:Place .
    ?s rdf:type/(rdfs:subClassOf{0,2})  dbo:Agent .
 }
} 
Limit 100

The results on our virtuoso version 07.20.3217 on Linux (x86_64-pc-linux-gnu), Single Server Edition (using latest tenforce docker image) Query 1: http://dbpedia.org/resource/Library_of_Parliament Query 2 and 3: nothing

From our understanding of property paths and by comparing with another RDF store we do not understand the missing result for query 2 and 3.

JJ-Author avatar Jan 05 '18 15:01 JJ-Author

any news on that?

JJ-Author avatar Jan 22 '18 13:01 JJ-Author

I have seen this problem also. In my case, a query of the form ...

  ?x rdf:type ?className .
  ?className rdfs:subClassOf* prefix1:ClassName1 .
  ?x prefix:predicate1 ?y .
  ?y rdf:type prefix2:ClassName2 .
  ...  # More variables and statements are defined

has the same binding sets returned REGARDLESS of whether the line, ?className rdfs:subClassOf* prefix1:ClassName1, is present in the query or not. Obviously, this means that lots of individuals that are not sub-classes of prefix:ClassName are also returned.

AndreaWesterinen avatar Jun 20 '18 22:06 AndreaWesterinen

@AndreaWesterinen Can you please provide a simple test case for recreating the problem being observed ?

HughWilliams avatar Jun 24 '18 13:06 HughWilliams

@HughWilliams I cannot share my current db. Do you have any obfuscation capabilities? Otherwise, I will try to create a simple test case. The error is 100% reproducible for my current data.

AndreaWesterinen avatar Jun 30 '18 17:06 AndreaWesterinen

@AndreaWesterinen: We don't have obfuscation per se, but do have a ability to provide statistics on the queries that have been executed against a given database without the need to provide its data, with the stat_export() function. Development can analyze its output to determine the cause of problems with various queries.

That said, a simple test case would be ideal ...

HughWilliams avatar Jul 02 '18 12:07 HughWilliams

I created a simple scenario in my post from January. Please see my post again...

JJ-Author avatar Jul 03 '18 13:07 JJ-Author

@AndreaWesterinen that is interesting. you have too many results? for me it was vice versa. see my "minimal non-working example" above

JJ-Author avatar Jul 20 '18 15:07 JJ-Author

Is this still an issue?

metasj avatar Feb 24 '23 19:02 metasj

@metasj: We were never been able to recreate this issue back in 2018.

Thus if you have such an issue do you have a description and test case for recreating with a latest Virtuoso open source build ?

HughWilliams avatar Feb 24 '23 22:02 HughWilliams