comunica
comunica copied to clipboard
Performance issues when federating over multiple SPARQL endpoints
Issue type:
- :snail: Performance issue
Description:
I'm running the following command to create a federated endpoint:
comunica-sparql-http -w4 -t300 sparql@http://localhost:8081/sparql sparql@http://localhost:8082/sparql sparql@http://localhost:8083/sparql sparql@http://localhost:8084/sparql
Here are some useful metrics about each endpoint, for running the following SPARQL query directly on each endpoint with responses having the following content-type: application/sparql-results+json;charset=UTF-8
:
SELECT DISTINCT * WHERE { ?s ?p ?o } LIMIT 100
endpoint | time | response size |
---|---|---|
1 | 161ms | 38.8 KB |
2 | 53.4s | 38.7 KB |
3 | 753ms | 38.5 KB |
4 | 227ms | 30.2 KB |
Running the query on the comunica endpoint (http://localhost:3000/sparql
), it's taking about 2.57min and return nothing.
In the logs, I'm able to see the following:
Server running on http://localhost:3000/sparql
Server worker (79250) running on http://localhost:3000/sparql
Server worker (79248) running on http://localhost:3000/sparql
Server worker (79249) running on http://localhost:3000/sparql
Server worker (79247) running on http://localhost:3000/sparql
[200] POST to /sparql
Requested media type: application/sparql-results+json
Received query query: SELECT DISTINCT * WHERE { ?s ?p ?o } LIMIT 100
Worker 79250 got assigned a new query (0).
<--- Last few GCs --->
[79250:0x158040000] 158323 ms: Scavenge 4020.7 (4123.6) -> 4018.2 (4125.9) MB, 9.6 / 0.0 ms (average mu = 0.540, current mu = 0.479) task;
[79250:0x158040000] 158351 ms: Scavenge 4022.8 (4125.9) -> 4020.0 (4143.4) MB, 12.3 / 0.0 ms (average mu = 0.540, current mu = 0.479) task;
[79250:0x158040000] 163267 ms: Mark-sweep 4031.7 (4143.6) -> 4023.1 (4148.9) MB, 4863.0 / 0.0 ms (average mu = 0.254, current mu = 0.052) task; scavenge might not succeed
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
1: 0x104811448 node::Abort() [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
2: 0x10481162c node::ModifyCodeGenerationFromStrings(v8::Local<v8::Context>, v8::Local<v8::Value>, bool) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
3: 0x104977fac v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
4: 0x104b367a0 v8::internal::EmbedderStackStateScope::EmbedderStackStateScope(v8::internal::Heap*, v8::internal::EmbedderStackStateScope::Origin, cppgc::EmbedderStackState) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
5: 0x104b351c4 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
6: 0x104bb9820 v8::internal::ScavengeJob::Task::RunInternal() [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
7: 0x104871fbc node::PerIsolatePlatformData::RunForegroundTask(std::__1::unique_ptr<v8::Task, std::__1::default_delete<v8::Task> >) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
8: 0x104870cb0 node::PerIsolatePlatformData::FlushForegroundTasksInternal() [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
9: 0x106c2ffb8 uv__async_io [/nix/store/3a685f2r0l2fnz899vwl70vl36yykj0r-libuv-1.46.0/lib/libuv.1.dylib]
10: 0x106c42d6c uv__io_poll [/nix/store/3a685f2r0l2fnz899vwl70vl36yykj0r-libuv-1.46.0/lib/libuv.1.dylib]
11: 0x106c3066c uv_run [/nix/store/3a685f2r0l2fnz899vwl70vl36yykj0r-libuv-1.46.0/lib/libuv.1.dylib]
12: 0x10474d940 node::SpinEventLoop(node::Environment*) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
13: 0x10484fdb0 node::NodeMainInstance::Run() [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
14: 0x1047d9efc node::LoadSnapshotDataAndRun(node::SnapshotData const**, node::InitializationResult const*) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
15: 0x1047da1e8 node::Start(int, char**) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
16: 0x18b427f28 start [/usr/lib/dyld]
Worker 79250 died with SIGABRT. Starting new worker.
Server worker (79576) running on http://localhost:3000/sparql
I don't understand why it is hitting the 4 GB heap, because the size of the results for each endpoint is very small. Is there a memory leak?
Also by checking the logs of an endpoint, it seems that it is doing the following requests:
SELECT (COUNT(*) AS ?count) WHERE { ?s ?p ?o. }
and
SELECT ?s ?p ?o WHERE { ?s ?p ?o. }
The LIMIT
keyword seem to be lost somewhere.
This is, I guess, the reason why it's making the thing explode if I'm using a big endpoint in the list ; it will query everything from the big endpoint.
Is it possible that the LIMIT
can be forwarded to avoid such issues?
Environment:
software | version |
---|---|
Comunica Engine | 2.8.2 |
node | v18.17.1 |
npm | 9.6.7 |
yarn | 1.22.19 |
Operating System | darwin (Darwin 22.5.0) |