comunica icon indicating copy to clipboard operation
comunica copied to clipboard

Performance issues when federating over multiple SPARQL endpoints

Open ludovicm67 opened this issue 10 months ago • 4 comments

Issue type:

  • :snail: Performance issue

Description:

I'm running the following command to create a federated endpoint:

comunica-sparql-http -w4 -t300 sparql@http://localhost:8081/sparql sparql@http://localhost:8082/sparql sparql@http://localhost:8083/sparql sparql@http://localhost:8084/sparql 

Here are some useful metrics about each endpoint, for running the following SPARQL query directly on each endpoint with responses having the following content-type: application/sparql-results+json;charset=UTF-8:

SELECT DISTINCT * WHERE { ?s ?p ?o } LIMIT 100
endpoint time response size
1 161ms 38.8 KB
2 53.4s 38.7 KB
3 753ms 38.5 KB
4 227ms 30.2 KB

Running the query on the comunica endpoint (http://localhost:3000/sparql), it's taking about 2.57min and return nothing.

In the logs, I'm able to see the following:

Server running on http://localhost:3000/sparql
Server worker (79250) running on http://localhost:3000/sparql
Server worker (79248) running on http://localhost:3000/sparql
Server worker (79249) running on http://localhost:3000/sparql
Server worker (79247) running on http://localhost:3000/sparql
[200] POST to /sparql
      Requested media type: application/sparql-results+json
      Received query query: SELECT DISTINCT * WHERE { ?s ?p ?o } LIMIT 100
Worker 79250 got assigned a new query (0).

<--- Last few GCs --->

[79250:0x158040000]   158323 ms: Scavenge 4020.7 (4123.6) -> 4018.2 (4125.9) MB, 9.6 / 0.0 ms  (average mu = 0.540, current mu = 0.479) task;
[79250:0x158040000]   158351 ms: Scavenge 4022.8 (4125.9) -> 4020.0 (4143.4) MB, 12.3 / 0.0 ms  (average mu = 0.540, current mu = 0.479) task;
[79250:0x158040000]   163267 ms: Mark-sweep 4031.7 (4143.6) -> 4023.1 (4148.9) MB, 4863.0 / 0.0 ms  (average mu = 0.254, current mu = 0.052) task; scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0x104811448 node::Abort() [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
 2: 0x10481162c node::ModifyCodeGenerationFromStrings(v8::Local<v8::Context>, v8::Local<v8::Value>, bool) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
 3: 0x104977fac v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
 4: 0x104b367a0 v8::internal::EmbedderStackStateScope::EmbedderStackStateScope(v8::internal::Heap*, v8::internal::EmbedderStackStateScope::Origin, cppgc::EmbedderStackState) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
 5: 0x104b351c4 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
 6: 0x104bb9820 v8::internal::ScavengeJob::Task::RunInternal() [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
 7: 0x104871fbc node::PerIsolatePlatformData::RunForegroundTask(std::__1::unique_ptr<v8::Task, std::__1::default_delete<v8::Task> >) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
 8: 0x104870cb0 node::PerIsolatePlatformData::FlushForegroundTasksInternal() [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
 9: 0x106c2ffb8 uv__async_io [/nix/store/3a685f2r0l2fnz899vwl70vl36yykj0r-libuv-1.46.0/lib/libuv.1.dylib]
10: 0x106c42d6c uv__io_poll [/nix/store/3a685f2r0l2fnz899vwl70vl36yykj0r-libuv-1.46.0/lib/libuv.1.dylib]
11: 0x106c3066c uv_run [/nix/store/3a685f2r0l2fnz899vwl70vl36yykj0r-libuv-1.46.0/lib/libuv.1.dylib]
12: 0x10474d940 node::SpinEventLoop(node::Environment*) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
13: 0x10484fdb0 node::NodeMainInstance::Run() [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
14: 0x1047d9efc node::LoadSnapshotDataAndRun(node::SnapshotData const**, node::InitializationResult const*) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
15: 0x1047da1e8 node::Start(int, char**) [/nix/store/n4pkh2cs837cak2kyjgd6sjskcqqb1gr-nodejs-18.17.1/bin/node]
16: 0x18b427f28 start [/usr/lib/dyld]
Worker 79250 died with SIGABRT. Starting new worker.
Server worker (79576) running on http://localhost:3000/sparql

I don't understand why it is hitting the 4 GB heap, because the size of the results for each endpoint is very small. Is there a memory leak?

Also by checking the logs of an endpoint, it seems that it is doing the following requests:

SELECT (COUNT(*) AS ?count) WHERE { ?s ?p ?o. }

and

SELECT ?s ?p ?o WHERE { ?s ?p ?o. }

The LIMIT keyword seem to be lost somewhere.

This is, I guess, the reason why it's making the thing explode if I'm using a big endpoint in the list ; it will query everything from the big endpoint.

Is it possible that the LIMIT can be forwarded to avoid such issues?


Environment:

software version
Comunica Engine 2.8.2
node v18.17.1
npm 9.6.7
yarn 1.22.19
Operating System darwin (Darwin 22.5.0)

ludovicm67 avatar Aug 23 '23 11:08 ludovicm67