GraphSPARQL icon indicating copy to clipboard operation
GraphSPARQL copied to clipboard

Suboptimal queries when using remote SPARQL endpoint

Open elordis opened this issue 1 year ago • 0 comments

Hello.

We have an existing triplestore with SPARQL endpoint running and want to attach a GraphQL interface to it. The problem is that any query that we try to run results in a enormously large data transfers. Our config is pretty simple:

{
    "DataSources": [
        {
            "Name": "corese",
            "Provider": "remote",
            "Default": true,
            "DefaultNamespace": "http://example.net/device.owl#",
            "Prefixes": {
                "net": "http://example.net/device.owl#"
            },
            "Settings": {
                "EndpointUri": "<omitted>"
            }
        }
    ],
    "Definitions": [
        {
            "Provider": "inline",
            "Settings": {
                "Schema": {
                    "Query": {
                        "Fields": [
                            {
                                "Name": "device",
                                "Object": "Device",
                                "IsArray": true
                            }
                        ]
                    },
                    "Interfaces": [
                        {
                            "Name": "IRdfsExtensions",
                            "Namespace": "http://www.w3.org/2000/01/rdf-schema#",
                            "Fields": [
                                {
                                    "Name": "label",
                                    "Scalar": "String"
                                }
                            ]
                        }
                    ],
                    "Types": [
                        {
                            "Name": "Device",
                            "Interfaces": [
                                "IRdfsExtensions"
                            ]
                        }
                    ]
                }
            }
        }
    ]
}

We do a simple query like this:

query {
	device(id: "http://example.net/device.owl#<omitted>"){
		label
	}
}

And it takes more than 4 seconds to complete on our database, while a SPARQL query for similar results would complete in 40 ms. I've decided to look how our database is queried adn discovered that GraphSPARQL pretty much tries to read the whole database on every request. E.g. the request above is translated into two queries: First one is fine:

CONSTRUCT
{ ?__s0 <https://schema.uibk.ac.at/GraphSPARQL/triples/p0> ?__o0 . }
WHERE
{
  { }
  UNION
  {
    ?__o0 a ?__s0 .
    FILTER((?__s0 = <http://example.net/device.owl#Device>) && (?__o0 = <http://example.net/device.owl#<omitted>>))
  }
}

But second one reads every label in the database. Also, if I didn't use a filter in above request, the VALUES section would still include results from first query of which there may be a few thousand rows.

CONSTRUCT
{ ?__s0 <https://schema.uibk.ac.at/GraphSPARQL/triples/p0> ?__o0 . }
WHERE
{
  { }
  UNION
  {
    {
      VALUES ( ?__s0 )
      {( <http://example.net/device.owl#<omitted>>)}
    }
    UNION
    { ?__s0 <http://www.w3.org/2000/01/rdf-schema#label> ?__o0 . }
  }
}

And that is on simplest GraphQL queries. When we try to do anything resembling our production needs, things become even worse.

So, is everything working as intended? Did we maybe miss some hidden options to speed up things?

elordis avatar Nov 07 '23 15:11 elordis