graph-explorer
graph-explorer copied to clipboard
[Bug] RDF: Adding resource to the canvas is very slow or fails with out of memory
Community Note
- Please use a 👍 reaction to provide a +1/vote. This helps the community and maintainers prioritize this request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
Describe the bug On a larger RDF database, when I add a resource from the search panel to the canvas it can take anywhere from 30 seconds to 10 minutes to complete. During this time no indication is given to the user that something is happening.
- OS: macOS 14.3.1
- Browser: Arc (Google Chromium)
- Graph Explorer Version: 1.5.1
- Graph Database & Version: Amazon Neptune
To Reproduce Steps to reproduce the behavior:
- Connect to a large RDF database with SPARQL
- Search for a resource with many neighbors or relationships
- Add that resource to the canvas
- Observe nothing happening in the UI for a long time
You can see the pending request in the browser's network tab.
Slow response
In this example, the request took 1.4 min to complete:
And here you can see there are not that many neighbors:
And here is the query that was executed:
SELECT ?class (COUNT(?class) AS ?count) {
?subject a ?class {
SELECT DISTINCT ?subject ?class {
?subject a ?class .
{ ?subject ?p <http://aws.amazon.com/neptune/csv2rdf/resource/270> }
UNION
{ <http://aws.amazon.com/neptune/csv2rdf/resource/270> ?p ?subject }
}
LIMIT 500
}
}
GROUP BY ?class
Out of Memory Error
The query that was executed was:
SELECT ?class (COUNT(?class) AS ?count) {
?subject a ?class {
SELECT DISTINCT ?subject ?class {
?subject a ?class .
{ ?subject ?p <http://aws.amazon.com/neptune/csv2rdf/resource/414> }
UNION
{ <http://aws.amazon.com/neptune/csv2rdf/resource/414> ?p ?subject }
}
LIMIT 500
}
}
GROUP BY ?class
This resulted in an out of memory error:
{
"error": {
"status": 500,
"message": "\n{\n \"detailedMessage\": \"Operation terminated (out of memory)\",\n \"requestId\": \"38d41423-0bb8-446a-8d11-4de1ee8cfb24\",\n \"code\": \"MemoryLimitExceededException\",\n \"message\": \"Operation terminated (out of memory)\"\n}"
}
}
Expected behavior Adding a single resource to the canvas should not be slow or cause errors.
I believe that query could be improved significantly. That query appears to be counting the number of neighbours for the "new resource", grouped by the neighbour's class.
I don't see a need for it to have the subquery to select all of the neighbours, nor do I see a need for using DISTINCT
here. To the best of my knowledge, the only way that subquery could produce duplicate results would be if there were 2 duplicate statements of the form ?subject a ?class
. I don't believe that Neptune allows for duplicate statements (this should be verified).
Given this, I would expect a query such as this to perform much better and produce equivalent results:
SELECT ?class (COUNT(?class) AS ?count) {
?neighbour a ?class .
{ ?neighbour ?p <http://aws.amazon.com/neptune/csv2rdf/resource/414> }
UNION
{ <http://aws.amazon.com/neptune/csv2rdf/resource/414> ?p ?neighbour }
}
GROUP BY ?class
Possibly related to
- #184
- #324