jena-sparql-api icon indicating copy to clipboard operation
jena-sparql-api copied to clipboard

Binary Search fails on a databus resource

Open Aklakan opened this issue 4 years ago • 1 comments

The request below results in a syntax error in the binary search engine (currently in jena-sparql-api-io-core). The reason is that the block boundary in the bzip2 is not correctly detected causing data to get cut off. The buggy code still uses my regex matching and needs to be updated to use the hadoop codec.

SELECT * {
  SERVICE <x-binsearch:vfs:https://downloads.dbpedia.org/repo/lts/transition/links/2017.11.01/links_domain=bricklink_lang=en.nt.bz2> {
    { SELECT * {
      ?s ?p ?o
    } LIMIT 10 }
  }
}

Aklakan avatar Sep 02 '21 08:09 Aklakan

The url is still online and it works with the revised code which now only relies on hadoop's Bzip2Codec (and none of my old matching stuff).

Remaining work before finally closing this issue:

  • [ ] transfer issue to jenax
  • [ ] add this file as a test case
  • [ ] add buffering under the bzip2 stream; right now for this small file it repeatedly fetches the same bz2 block via http.

Aklakan avatar Sep 05 '24 19:09 Aklakan