jena-sparql-api
jena-sparql-api copied to clipboard
Binary Search fails on a databus resource
The request below results in a syntax error in the binary search engine (currently in jena-sparql-api-io-core). The reason is that the block boundary in the bzip2 is not correctly detected causing data to get cut off. The buggy code still uses my regex matching and needs to be updated to use the hadoop codec.
SELECT * {
SERVICE <x-binsearch:vfs:https://downloads.dbpedia.org/repo/lts/transition/links/2017.11.01/links_domain=bricklink_lang=en.nt.bz2> {
{ SELECT * {
?s ?p ?o
} LIMIT 10 }
}
}
The url is still online and it works with the revised code which now only relies on hadoop's Bzip2Codec (and none of my old matching stuff).
Remaining work before finally closing this issue:
- [ ] transfer issue to jenax
- [ ] add this file as a test case
- [ ] add buffering under the bzip2 stream; right now for this small file it repeatedly fetches the same bz2 block via http.