Not all data is correctly indexed/returned
Using the following RDF/XML data file - http://static.adamretter.org.uk/HHS_Provider_Relief_Fund.rdf.gz
I can't seem to ever get more than 10 results back from querying it with SPARQL in eXist-db:
xquery version "3.1";
import module namespace sparql = "http://exist-db.org/xquery/sparql";
let $query1 := '
PREFIX ds: <https://data.cdc.gov/resource/kh8y-3es6/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT (count(DISTINCT ?state) as ?count)
WHERE {
?provider ds:state ?state
}
'
return
sparql:query($query1)
returns the count of 10, i.e.:
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
<head>
<variable name="count"/>
</head>
<results>
<result>
<binding name="count">
<literal datatype="http://www.w3.org/2001/XMLSchema#integer">10</literal>
</binding>
</result>
</results>
</sparql>
However the XQuery on RDF/XML shows that the result should actually be 55:
count(distinct-values(doc("/db/hhs-provider/hhs-provider.rdf")//*:state/string(.)))
The result from the SPARQL query (10) is wrong, the XQuery result of 55 is correct.
I also decided to test this directly with TDB from Apache Jena 3.15.0
I loaded the data:
$ bin/tdbloader --loc=/tmp/tdb /tmp/HHS_Provider_Relief_Fund.rd
...
** Completed: 1,471,085 triples loaded in 18.07 seconds [Rate: 81,396.84 per second]
I created the SPARQL file /tmp/states.sparql:
PREFIX ds: <https://data.cdc.gov/resource/kh8y-3es6/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT (count(DISTINCT ?state) as ?count)
WHERE {
?provider ds:state ?state
}
I then executed the SPARQL query:
$ bin/tdbquery --loc=/tmp/tdb --file /tmp/states.sparql
---------
| count |
=========
| 55 |
---------
So using TDB directly returns the correct result - therefore I have to suspect some bug somewhere in the exist-sparql module.