Skosmos icon indicating copy to clipboard operation
Skosmos copied to clipboard

Query optimisation.

Open pulquero opened this issue 3 years ago • 5 comments

pulquero avatar Nov 09 '21 21:11 pulquero

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

sonarqubecloud[bot] avatar Nov 09 '21 21:11 sonarqubecloud[bot]

Thanks for the PR @pulquero !

Can you explain what you did here in a bit more detail?

Which operation was slow, and how much does the optimisation help?

Are there any side effects that you are aware of?

osma avatar Nov 10 '21 08:11 osma

For my data it was milliseconds vs minutes.

Added my observations as comments below:

 SELECT ?object ?label (GROUP_CONCAT(STR(?dir);separator=' ') as ?direct)
 WHERE {
    <$uri> a skos:Concept .
    OPTIONAL {
      <$uri> $propertyClause* ?object . # ?object may not be bound, but looks like we only care about ?object being bound, what is the reason for this being in an optional?
      OPTIONAL {
        ?object $propertyClause ?dir .
      }
    }
    OPTIONAL {
      ?object skos:prefLabel ?label . # only has an effect if ?object is bound, else it has no correlation with the non-optional part.
      FILTER (langMatches(lang(?label), "$lang"))
    }
    $otherlang
  }
  GROUP BY ?object ?label

pulquero avatar Nov 10 '21 12:11 pulquero

Thanks for the details. It's still not entirely clear to me which operation was slow from the user perspective. The function in question (generateTransitivePropertyQuery) is a rather low level one and is used, indirectly, at least to generate the QL query used for querying breadcrumb paths in the web UI, but also for some of the REST API methods. It would be good to know e.g. which direction is relevant here (transitive broaders - like in the breadcrumbs - or transitive narrowers?)

Also, what does your data look like? Is the hierarchy somehow big or complicated since the query ends up taking minutes? This hasn't been a big performance issue in the past for us, that's why I'm asking.

Also, which triple store? We're using Fuseki mostly, but are you perhaps using GraphDB as in your other PR?

osma avatar Nov 12 '21 13:11 osma

I'm using graphdb, and my vocabulary consists of a million skos:Concept. It is in the default graph alongside other vocabularies and cross-referenced. I think the avg tree depth is about 3. I believe

<$uri> a skos:Concept OPTIONAL { ?object skos:prefLabel ?label . # only has an effect if ?object is bound, else it has no correlation with the non-optional part. FILTER (langMatches(lang(?label), "$lang")) }

results in a cartesian product with ?object skos:prefLabel ?label matching everything in the entire default graph.

pulquero avatar Nov 12 '21 14:11 pulquero