virtuoso-opensource icon indicating copy to clipboard operation
virtuoso-opensource copied to clipboard

Inconsistent anytime response headers behaviour on SPARQL endpoint

Open serasset opened this issue 3 years ago • 7 comments

We are trying to query a dataset while taking care of anytime feature (we indeed want to get partial results rather than errors in case we hit some limits).

Queries are executed on the same virtuoso instance at http://kaiko.getalp.org/sparql

Q1:

SELECT ?lemma, COUNT(?written_trans) AS ?nb_translations
WHERE {
    ?lexentry ontolex:canonicalForm [
        ontolex:writtenRep ?lemma 
    ] .
    FILTER (lang(?lemma) = 'en')
    ?trans dbnary:isTranslationOf ?lexentry ;
        dbnary:writtenForm ?written_trans .
} LIMIT 1000

Q2:

SELECT distinct ?A ?p ?B 
WHERE {
    ?a a/rdfs:subClassOf* ?A. 
    ?a ?p ?b. 
    OPTIONAL { ?b rdfs:subClassOf ?B }}

Without timeout value specified

  • Q1 succeeds after 22 seconds spent (which is less than the server timeout of 60 seconds).
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=SELECT+%3Flemma%2C+COUNT%28%3Fwritten_trans%29+AS+%3Fnb_translations%0D%0AWHERE+%7B%0D%0A++++%3Flexentry+ontolex%3AcanonicalForm+%5B%0D%0A++++++++ontolex%3AwrittenRep+%3Flemma+%0D%0A++++%5D+.%0D%0A++++FILTER+%28lang%28%3Flemma%29+%3D+%27en%27%29%0D%0A++++%3Ftrans+dbnary%3AisTranslationOf+%3Flexentry+%3B%0D%0A++++++++dbnary%3AwrittenForm+%3Fwritten_trans+.%0D%0A%7D+LIMIT+1000%0D%0A&should-sponge=&format=text%2Fhtml&timeout=0&debug=on&run=+Run+Query+'
HTTP/1.1 200 OK
Date: Wed, 27 Apr 2022 10:22:39 GMT
Server: Virtuoso/07.20.3230 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-Type: text/html; charset=UTF-8
Content-Length: 82087

curl -I   0.00s user 0.01s system 0% cpu 22.525 total
  • Q2 times out after 60 seconds
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=select+distinct+%3FA+%3Fp+%3FB+where+%7B%3Fa+a%2Frdfs%3AsubClassOf*+%3FA.+%3Fa+%3Fp+%3Fb.+OPTIONAL+%7B+%3Fb+rdfs%3AsubClassOf+%3FB+%7D%7D&should-sponge=&format=text%2Fhtml&timeout=&debug=on&run=+Run+Query+'
HTTP/1.1 500 SPARQL Request Failed
Date: Wed, 27 Apr 2022 11:44:09 GMT
Server: Virtuoso/07.20.3230 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-Type: text/plain; charset=UTF-8
Content-Length: 168
Connection: close

curl -I   0.00s user 0.01s system 0% cpu 1:01.65 total

So far so good.

With timeout value specified

  • Q1 indeeds hits the timeout of 1 second and give back the expected X-SQL-State: S1TAT with additional headers.
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=SELECT+%3Flemma%2C+COUNT%28%3Fwritten_trans%29+AS+%3Fnb_translations%0D%0AWHERE+%7B%0D%0A++++%3Flexentry+ontolex%3AcanonicalForm+%5B%0D%0A++++++++ontolex%3AwrittenRep+%3Flemma+%0D%0A++++%5D+.%0D%0A++++FILTER+%28lang%28%3Flemma%29+%3D+%27en%27%29%0D%0A++++%3Ftrans+dbnary%3AisTranslationOf+%3Flexentry+%3B%0D%0A++++++++dbnary%3AwrittenForm+%3Fwritten_trans+.%0D%0A%7D+LIMIT+1000%0D%0A&should-sponge=&format=text%2Fhtml&timeout=1001&debug=on&run=+Run+Query+'
HTTP/1.1 200 OK
Date: Wed, 27 Apr 2022 10:23:53 GMT
Server: Virtuoso/07.20.3230 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
X-SQL-State: S1TAT
X-SQL-Message: RC...: Returning incomplete results, query interrupted by result timeout.  Activity:  4.649M rnd  2.042M seq  2.227M same seg   1.086M same pg  119.5K same par      0 disk      0 spec disk      0B /      0
X-Exec-Milliseconds: 3142
X-Exec-DB-Activity: 4.649M rnd  2.042M seq  2.227M same seg   1.086M same pg  119.5K same par      0 disk      0 spec disk      0B /      0 messages      0 fork
Content-Type: text/html; charset=UTF-8
Content-Length: 81684
  • However, Q2 also hits the timeout and tries to return partial data (as expected by anytime query behaviour), but it does not give any response header stating so.
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=select+distinct+%3FA+%3Fp+%3FB+where+%7B%3Fa+a%2Frdfs%3AsubClassOf*+%3FA.+%3Fa+%3Fp+%3Fb.+OPTIONAL+%7B+%3Fb+rdfs%3AsubClassOf+%3FB+%7D%7D&should-sponge=&format=text%2Fhtml&timeout=1001&debug=on&run=+Run+Query+'
HTTP/1.1 200 OK
Date: Wed, 27 Apr 2022 10:37:19 GMT
Server: Virtuoso/07.20.3230 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-Type: text/html; charset=UTF-8
Content-Length: 192742

curl -I   0.00s user 0.01s system 0% cpu 3.217 total

Expected behaviour: both query hit timeout in anytime query settings and should both return X-SQL-State: S1TAT with additional headers.

serasset avatar Apr 27 '22 11:04 serasset

First thing, I see that your endpoint is running version 07.20.3230 (a/k/a 7.2.6.1), built Sep 27, 2020 from git_head 840b468fc, committed on Sep 1, 2020, probably from the stable/7 branch.

I would suggest you test against a current build from the develop/7 branch (which we will be shipping as/to stable/7 in the next few weeks). If we can quickly confirm this behavior in the latest develop/7 branch, we may be able to get a patch in before that stable/7 push.

TallTed avatar Apr 27 '22 13:04 TallTed

I recompiled virtuoso on my debian machine with :

./configure --with-layout=debian --with-readline --enable-dbpedia-vad --enable-fct-vad --enable-rdfmappers-vad --with-port=2222
make
sudo make install

Then an access to http://kaiko.getalp.org/sparql led to Virtuoso 42001 Error SR185: Undefined procedure WS.WS.SPARQL_ENDPOINT_GENERATE_FORM

I regenarated the whole db from scratch using the newly compiled virtuoso, same problem. Any idea ?

serasset avatar Apr 27 '22 19:04 serasset

When switching from stable/7 to develop/7 some of the build environment needs to be regenerated.

Please run the following command:

$ bash autogen.sh

Followed by the following commands to rebuild and install VOS:

$ make

$ sudo make install

pkleef avatar Apr 27 '22 20:04 pkleef

Yes, I did that, but the distclean seams to keep some files. So I fetched a clean git clone and the compilation went well. I'll try the queries now.

serasset avatar Apr 28 '22 06:04 serasset

With Virtuoso version 07.20.3233 on Linux (x86_64-pc-linux-gnu) Single Server Edition (142 GB total memory, 17 GB memory in use) :

Theoretical results (obtained through isql with no timeout) : Q1:

  • time 27s
  • results: 1000 rows (obviously...) Q2:
  • time: 9 m 36s
  • results : 3140 rows

Through SPARQL endpoint without timeout Q1:

time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=SELECT+%3Flemma%2C+COUNT%28%3Fwritten_trans%29+AS+%3Fnb_translations%0D%0AWHERE+%7B%0D%0A++++%3Flexentry+ontolex%3AcanonicalForm+%5B%0D%0A++++++++ontolex%3AwrittenRep+%3Flemma+%0D%0A++++%5D+.%0D%0A++++FILTER+%28lang%28%3Flemma%29+%3D+%27en%27%29%0D%0A++++%3Ftrans+dbnary%3AisTranslationOf+%3Flexentry+%3B%0D%0A++++++++dbnary%3AwrittenForm+%3Fwritten_trans+.%0D%0A%7D+LIMIT+1000%0D%0A&should-sponge=&format=text%2Fhtml&timeout=0&debug=on&run=+Run+Query+'

HTTP/1.1 200 OK
Date: Thu, 28 Apr 2022 06:11:47 GMT
Server: Virtuoso/07.20.3233 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-disposition: filename=sparql_2022-04-28_06-11-47Z.html
Content-Type: text/html; charset=UTF-8
Content-Length: 82941

curl -I   0.00s user 0.02s system 0% cpu 27.210 total

This is analoguous to the previous result and expected.

Q2:

time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=select+distinct+%3FA+%3Fp+%3FB+where+%7B%3Fa+a%2Frdfs%3AsubClassOf*+%3FA.+%3Fa+%3Fp+%3Fb.+OPTIONAL+%7B+%3Fb+rdfs%3AsubClassOf+%3FB+%7D%7D&should-sponge=&format=text%2Fhtml&timeout=&debug=on&run=+Run+Query+'

HTTP/1.1 200 OK
Date: Thu, 28 Apr 2022 06:13:14 GMT
Server: Virtuoso/07.20.3233 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-disposition: filename=sparql_2022-04-28_06-13-14Z.html
Content-Type: text/html; charset=UTF-8
Content-Length: 314378

curl -I   0.00s user 0.01s system 0% cpu 1:00.34 total

Here the default server timeout is hit, but now the request does not fail anymore but returns partial results (~850 rows) without any adequate response headers. This is unexpected in this version while it was failing (expectedly) in previous version.

Through SPARQL endpoint with timeout Q1:

time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=SELECT+%3Flemma%2C+COUNT%28%3Fwritten_trans%29+AS+%3Fnb_translations%0D%0AWHERE+%7B%0D%0A++++%3Flexentry+ontolex%3AcanonicalForm+%5B%0D%0A++++++++ontolex%3AwrittenRep+%3Flemma+%0D%0A++++%5D+.%0D%0A++++FILTER+%28lang%28%3Flemma%29+%3D+%27en%27%29%0D%0A++++%3Ftrans+dbnary%3AisTranslationOf+%3Flexentry+%3B%0D%0A++++++++dbnary%3AwrittenForm+%3Fwritten_trans+.%0D%0A%7D+LIMIT+1000%0D%0A&should-sponge=&format=text%2Fhtml&timeout=1001&debug=on&run=+Run+Query+'

HTTP/1.1 200 OK
Date: Thu, 28 Apr 2022 06:22:59 GMT
Server: Virtuoso/07.20.3233 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
X-SQL-State: S1TAT
X-SQL-Message: RC...: Returning incomplete results, query interrupted by result timeout.  Activity:  4.156M rnd  1.426M seq  1.616M same seg   1.183M same pg  110.8K same par      0 disk      0 spec disk      0B /      0
X-Exec-Milliseconds: 2981
X-Exec-DB-Activity: 4.156M rnd  1.426M seq  1.616M same seg   1.183M same pg  110.8K same par      0 disk      0 spec disk      0B /      0 messages      0 fork
Content-disposition: filename=sparql_2022-04-28_06-22-59Z.html
Content-Type: text/html; charset=UTF-8
Content-Length: 82566

curl -I   0.00s user 0.01s system 0% cpu 3.080 total

This is analoguous to the previous result and expected. Partial results returned and response headers given back stating so.

Q2:

time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=select+distinct+%3FA+%3Fp+%3FB+where+%7B%3Fa+a%2Frdfs%3AsubClassOf*+%3FA.+%3Fa+%3Fp+%3Fb.+OPTIONAL+%7B+%3Fb+rdfs%3AsubClassOf+%3FB+%7D%7D&should-sponge=&format=text%2Fhtml&timeout=1001&debug=on&run=+Run+Query+'

HTTP/1.1 200 OK
Date: Thu, 28 Apr 2022 06:23:25 GMT
Server: Virtuoso/07.20.3233 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-disposition: filename=sparql_2022-04-28_06-23-25Z.html
Content-Type: text/html; charset=UTF-8
Content-Length: 72651

curl -I   0.00s user 0.01s system 1% cpu 1.709 total

This is unexpected as the results are indeed partial after hitting the timeout (~217 rows) but no headers is given to state so.

serasset avatar Apr 28 '22 06:04 serasset

We will attempt to recreate this issue in house and see what is going on.

pkleef avatar Apr 28 '22 11:04 pkleef

If you need to recreate the exact dbnary server settings, all dataset is available at http://kaiko.getalp.org/static/ontolex/latest/ and the database is created using script available at https://gitlab.com/gilles.serasset/dbnary/-/blob/master/kaiko/virtuoso/rotate-virtuoso-db.sh

(Sorry, I do not have a docker setting available (yet).)

serasset avatar May 02 '22 07:05 serasset