graph-explorer icon indicating copy to clipboard operation
graph-explorer copied to clipboard

[Bug] RDF: Adding resource to the canvas is very slow or fails with out of memory

Open kmcginnes opened this issue 11 months ago • 2 comments

Community Note

  • Please use a 👍 reaction to provide a +1/vote. This helps the community and maintainers prioritize this request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Describe the bug On a larger RDF database, when I add a resource from the search panel to the canvas it can take anywhere from 30 seconds to 10 minutes to complete. During this time no indication is given to the user that something is happening.

  • OS: macOS 14.3.1
  • Browser: Arc (Google Chromium)
  • Graph Explorer Version: 1.5.1
  • Graph Database & Version: Amazon Neptune

To Reproduce Steps to reproduce the behavior:

  1. Connect to a large RDF database with SPARQL
  2. Search for a resource with many neighbors or relationships
  3. Add that resource to the canvas
  4. Observe nothing happening in the UI for a long time

You can see the pending request in the browser's network tab.

Slow response

In this example, the request took 1.4 min to complete:

CleanShot 2024-03-08 at 16 14 56@2x

And here you can see there are not that many neighbors:

CleanShot 2024-03-08 at 16 14 43@2x

And here is the query that was executed:

SELECT ?class (COUNT(?class) AS ?count) {
  ?subject a ?class {
    SELECT DISTINCT ?subject ?class {
      ?subject a ?class .
      { ?subject ?p <http://aws.amazon.com/neptune/csv2rdf/resource/270> }
      UNION
      { <http://aws.amazon.com/neptune/csv2rdf/resource/270> ?p ?subject }
    }
    LIMIT 500
  }
}
GROUP BY ?class

Out of Memory Error

The query that was executed was:

SELECT ?class (COUNT(?class) AS ?count) {
  ?subject a ?class {
    SELECT DISTINCT ?subject ?class {
      ?subject a ?class .
      { ?subject ?p <http://aws.amazon.com/neptune/csv2rdf/resource/414> }
      UNION
      { <http://aws.amazon.com/neptune/csv2rdf/resource/414> ?p ?subject }
    }
    LIMIT 500
  }
}
GROUP BY ?class

This resulted in an out of memory error:

{
    "error": {
        "status": 500,
        "message": "\n{\n  \"detailedMessage\": \"Operation terminated (out of memory)\",\n  \"requestId\": \"38d41423-0bb8-446a-8d11-4de1ee8cfb24\",\n  \"code\": \"MemoryLimitExceededException\",\n  \"message\": \"Operation terminated (out of memory)\"\n}"
    }
}

Expected behavior Adding a single resource to the canvas should not be slow or cause errors.

kmcginnes avatar Mar 08 '24 22:03 kmcginnes

I believe that query could be improved significantly. That query appears to be counting the number of neighbours for the "new resource", grouped by the neighbour's class.

I don't see a need for it to have the subquery to select all of the neighbours, nor do I see a need for using DISTINCT here. To the best of my knowledge, the only way that subquery could produce duplicate results would be if there were 2 duplicate statements of the form ?subject a ?class. I don't believe that Neptune allows for duplicate statements (this should be verified).

Given this, I would expect a query such as this to perform much better and produce equivalent results:

SELECT ?class (COUNT(?class) AS ?count) {
  ?neighbour a ?class .
  { ?neighbour ?p <http://aws.amazon.com/neptune/csv2rdf/resource/414> }
  UNION
  { <http://aws.amazon.com/neptune/csv2rdf/resource/414> ?p ?neighbour }
}
GROUP BY ?class

Cole-Greer avatar Mar 09 '24 00:03 Cole-Greer

Possibly related to

  • #184
  • #324

kmcginnes avatar Mar 18 '24 22:03 kmcginnes