Inconsistent anytime response headers behaviour on SPARQL endpoint
We are trying to query a dataset while taking care of anytime feature (we indeed want to get partial results rather than errors in case we hit some limits).
Queries are executed on the same virtuoso instance at http://kaiko.getalp.org/sparql
Q1:
SELECT ?lemma, COUNT(?written_trans) AS ?nb_translations
WHERE {
?lexentry ontolex:canonicalForm [
ontolex:writtenRep ?lemma
] .
FILTER (lang(?lemma) = 'en')
?trans dbnary:isTranslationOf ?lexentry ;
dbnary:writtenForm ?written_trans .
} LIMIT 1000
Q2:
SELECT distinct ?A ?p ?B
WHERE {
?a a/rdfs:subClassOf* ?A.
?a ?p ?b.
OPTIONAL { ?b rdfs:subClassOf ?B }}
Without timeout value specified
- Q1 succeeds after 22 seconds spent (which is less than the server timeout of 60 seconds).
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=SELECT+%3Flemma%2C+COUNT%28%3Fwritten_trans%29+AS+%3Fnb_translations%0D%0AWHERE+%7B%0D%0A++++%3Flexentry+ontolex%3AcanonicalForm+%5B%0D%0A++++++++ontolex%3AwrittenRep+%3Flemma+%0D%0A++++%5D+.%0D%0A++++FILTER+%28lang%28%3Flemma%29+%3D+%27en%27%29%0D%0A++++%3Ftrans+dbnary%3AisTranslationOf+%3Flexentry+%3B%0D%0A++++++++dbnary%3AwrittenForm+%3Fwritten_trans+.%0D%0A%7D+LIMIT+1000%0D%0A&should-sponge=&format=text%2Fhtml&timeout=0&debug=on&run=+Run+Query+'
HTTP/1.1 200 OK
Date: Wed, 27 Apr 2022 10:22:39 GMT
Server: Virtuoso/07.20.3230 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-Type: text/html; charset=UTF-8
Content-Length: 82087
curl -I 0.00s user 0.01s system 0% cpu 22.525 total
- Q2 times out after 60 seconds
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=select+distinct+%3FA+%3Fp+%3FB+where+%7B%3Fa+a%2Frdfs%3AsubClassOf*+%3FA.+%3Fa+%3Fp+%3Fb.+OPTIONAL+%7B+%3Fb+rdfs%3AsubClassOf+%3FB+%7D%7D&should-sponge=&format=text%2Fhtml&timeout=&debug=on&run=+Run+Query+'
HTTP/1.1 500 SPARQL Request Failed
Date: Wed, 27 Apr 2022 11:44:09 GMT
Server: Virtuoso/07.20.3230 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-Type: text/plain; charset=UTF-8
Content-Length: 168
Connection: close
curl -I 0.00s user 0.01s system 0% cpu 1:01.65 total
So far so good.
With timeout value specified
- Q1 indeeds hits the timeout of 1 second and give back the expected
X-SQL-State: S1TATwith additional headers.
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=SELECT+%3Flemma%2C+COUNT%28%3Fwritten_trans%29+AS+%3Fnb_translations%0D%0AWHERE+%7B%0D%0A++++%3Flexentry+ontolex%3AcanonicalForm+%5B%0D%0A++++++++ontolex%3AwrittenRep+%3Flemma+%0D%0A++++%5D+.%0D%0A++++FILTER+%28lang%28%3Flemma%29+%3D+%27en%27%29%0D%0A++++%3Ftrans+dbnary%3AisTranslationOf+%3Flexentry+%3B%0D%0A++++++++dbnary%3AwrittenForm+%3Fwritten_trans+.%0D%0A%7D+LIMIT+1000%0D%0A&should-sponge=&format=text%2Fhtml&timeout=1001&debug=on&run=+Run+Query+'
HTTP/1.1 200 OK
Date: Wed, 27 Apr 2022 10:23:53 GMT
Server: Virtuoso/07.20.3230 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
X-SQL-State: S1TAT
X-SQL-Message: RC...: Returning incomplete results, query interrupted by result timeout. Activity: 4.649M rnd 2.042M seq 2.227M same seg 1.086M same pg 119.5K same par 0 disk 0 spec disk 0B / 0
X-Exec-Milliseconds: 3142
X-Exec-DB-Activity: 4.649M rnd 2.042M seq 2.227M same seg 1.086M same pg 119.5K same par 0 disk 0 spec disk 0B / 0 messages 0 fork
Content-Type: text/html; charset=UTF-8
Content-Length: 81684
- However, Q2 also hits the timeout and tries to return partial data (as expected by anytime query behaviour), but it does not give any response header stating so.
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=select+distinct+%3FA+%3Fp+%3FB+where+%7B%3Fa+a%2Frdfs%3AsubClassOf*+%3FA.+%3Fa+%3Fp+%3Fb.+OPTIONAL+%7B+%3Fb+rdfs%3AsubClassOf+%3FB+%7D%7D&should-sponge=&format=text%2Fhtml&timeout=1001&debug=on&run=+Run+Query+'
HTTP/1.1 200 OK
Date: Wed, 27 Apr 2022 10:37:19 GMT
Server: Virtuoso/07.20.3230 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-Type: text/html; charset=UTF-8
Content-Length: 192742
curl -I 0.00s user 0.01s system 0% cpu 3.217 total
Expected behaviour: both query hit timeout in anytime query settings and should both return X-SQL-State: S1TAT with additional headers.
First thing, I see that your endpoint is running version 07.20.3230 (a/k/a 7.2.6.1), built Sep 27, 2020 from git_head 840b468fc, committed on Sep 1, 2020, probably from the stable/7 branch.
I would suggest you test against a current build from the develop/7 branch (which we will be shipping as/to stable/7 in the next few weeks). If we can quickly confirm this behavior in the latest develop/7 branch, we may be able to get a patch in before that stable/7 push.
I recompiled virtuoso on my debian machine with :
./configure --with-layout=debian --with-readline --enable-dbpedia-vad --enable-fct-vad --enable-rdfmappers-vad --with-port=2222
make
sudo make install
Then an access to http://kaiko.getalp.org/sparql led to Virtuoso 42001 Error SR185: Undefined procedure WS.WS.SPARQL_ENDPOINT_GENERATE_FORM
I regenarated the whole db from scratch using the newly compiled virtuoso, same problem. Any idea ?
When switching from stable/7 to develop/7 some of the build environment needs to be regenerated.
Please run the following command:
$ bash autogen.sh
Followed by the following commands to rebuild and install VOS:
$ make
$ sudo make install
Yes, I did that, but the distclean seams to keep some files. So I fetched a clean git clone and the compilation went well. I'll try the queries now.
With Virtuoso version 07.20.3233 on Linux (x86_64-pc-linux-gnu) Single Server Edition (142 GB total memory, 17 GB memory in use) :
Theoretical results (obtained through isql with no timeout) : Q1:
- time 27s
- results: 1000 rows (obviously...) Q2:
- time: 9 m 36s
- results : 3140 rows
Through SPARQL endpoint without timeout Q1:
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=SELECT+%3Flemma%2C+COUNT%28%3Fwritten_trans%29+AS+%3Fnb_translations%0D%0AWHERE+%7B%0D%0A++++%3Flexentry+ontolex%3AcanonicalForm+%5B%0D%0A++++++++ontolex%3AwrittenRep+%3Flemma+%0D%0A++++%5D+.%0D%0A++++FILTER+%28lang%28%3Flemma%29+%3D+%27en%27%29%0D%0A++++%3Ftrans+dbnary%3AisTranslationOf+%3Flexentry+%3B%0D%0A++++++++dbnary%3AwrittenForm+%3Fwritten_trans+.%0D%0A%7D+LIMIT+1000%0D%0A&should-sponge=&format=text%2Fhtml&timeout=0&debug=on&run=+Run+Query+'
HTTP/1.1 200 OK
Date: Thu, 28 Apr 2022 06:11:47 GMT
Server: Virtuoso/07.20.3233 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-disposition: filename=sparql_2022-04-28_06-11-47Z.html
Content-Type: text/html; charset=UTF-8
Content-Length: 82941
curl -I 0.00s user 0.02s system 0% cpu 27.210 total
This is analoguous to the previous result and expected.
Q2:
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=select+distinct+%3FA+%3Fp+%3FB+where+%7B%3Fa+a%2Frdfs%3AsubClassOf*+%3FA.+%3Fa+%3Fp+%3Fb.+OPTIONAL+%7B+%3Fb+rdfs%3AsubClassOf+%3FB+%7D%7D&should-sponge=&format=text%2Fhtml&timeout=&debug=on&run=+Run+Query+'
HTTP/1.1 200 OK
Date: Thu, 28 Apr 2022 06:13:14 GMT
Server: Virtuoso/07.20.3233 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-disposition: filename=sparql_2022-04-28_06-13-14Z.html
Content-Type: text/html; charset=UTF-8
Content-Length: 314378
curl -I 0.00s user 0.01s system 0% cpu 1:00.34 total
Here the default server timeout is hit, but now the request does not fail anymore but returns partial results (~850 rows) without any adequate response headers. This is unexpected in this version while it was failing (expectedly) in previous version.
Through SPARQL endpoint with timeout Q1:
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=SELECT+%3Flemma%2C+COUNT%28%3Fwritten_trans%29+AS+%3Fnb_translations%0D%0AWHERE+%7B%0D%0A++++%3Flexentry+ontolex%3AcanonicalForm+%5B%0D%0A++++++++ontolex%3AwrittenRep+%3Flemma+%0D%0A++++%5D+.%0D%0A++++FILTER+%28lang%28%3Flemma%29+%3D+%27en%27%29%0D%0A++++%3Ftrans+dbnary%3AisTranslationOf+%3Flexentry+%3B%0D%0A++++++++dbnary%3AwrittenForm+%3Fwritten_trans+.%0D%0A%7D+LIMIT+1000%0D%0A&should-sponge=&format=text%2Fhtml&timeout=1001&debug=on&run=+Run+Query+'
HTTP/1.1 200 OK
Date: Thu, 28 Apr 2022 06:22:59 GMT
Server: Virtuoso/07.20.3233 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
X-SQL-State: S1TAT
X-SQL-Message: RC...: Returning incomplete results, query interrupted by result timeout. Activity: 4.156M rnd 1.426M seq 1.616M same seg 1.183M same pg 110.8K same par 0 disk 0 spec disk 0B / 0
X-Exec-Milliseconds: 2981
X-Exec-DB-Activity: 4.156M rnd 1.426M seq 1.616M same seg 1.183M same pg 110.8K same par 0 disk 0 spec disk 0B / 0 messages 0 fork
Content-disposition: filename=sparql_2022-04-28_06-22-59Z.html
Content-Type: text/html; charset=UTF-8
Content-Length: 82566
curl -I 0.00s user 0.01s system 0% cpu 3.080 total
This is analoguous to the previous result and expected. Partial results returned and response headers given back stating so.
Q2:
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=select+distinct+%3FA+%3Fp+%3FB+where+%7B%3Fa+a%2Frdfs%3AsubClassOf*+%3FA.+%3Fa+%3Fp+%3Fb.+OPTIONAL+%7B+%3Fb+rdfs%3AsubClassOf+%3FB+%7D%7D&should-sponge=&format=text%2Fhtml&timeout=1001&debug=on&run=+Run+Query+'
HTTP/1.1 200 OK
Date: Thu, 28 Apr 2022 06:23:25 GMT
Server: Virtuoso/07.20.3233 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-disposition: filename=sparql_2022-04-28_06-23-25Z.html
Content-Type: text/html; charset=UTF-8
Content-Length: 72651
curl -I 0.00s user 0.01s system 1% cpu 1.709 total
This is unexpected as the results are indeed partial after hitting the timeout (~217 rows) but no headers is given to state so.
We will attempt to recreate this issue in house and see what is going on.
If you need to recreate the exact dbnary server settings, all dataset is available at http://kaiko.getalp.org/static/ontolex/latest/ and the database is created using script available at https://gitlab.com/gilles.serasset/dbnary/-/blob/master/kaiko/virtuoso/rotate-virtuoso-db.sh
(Sorry, I do not have a docker setting available (yet).)