jsonld.js icon indicating copy to clipboard operation
jsonld.js copied to clipboard

Performance of JSON-LD framing for larger documents

Open jindrichmynarz opened this issue 7 years ago • 9 comments

I have a larger JSON-LD document (24 MB expanded). Framing it gets stuck with 1 CPU fully used (little memory is used). I have a few questions:

  • What kinds of sizes of JSON-LD documents is the framing tested to work with?
  • How can I diagnose why the framing gets stuck? Is there a "verbose" mode?
  • Is there any configuration that can help processing larger JSON-LD documents?

jindrichmynarz avatar May 29 '18 09:05 jindrichmynarz

It's possible you're the first to try framing with docs that size. Our use cases have been smaller docs than that. Perhaps someone else in the community has tried larger docs?

If you're running in node, a good first step would be to start up in debug mode and hook in the chrome dev tools. Then you can do a quick profile and maybe it'll be obvious some code has gone exponential.

Maybe also write a quick test using the ruby implementation to see if it has the same problems. Perhaps there's some insight in that code on better data structures to use. https://github.com/ruby-rdf/json-ld

Is the data available somewhere for others to test?

davidlehn avatar May 29 '18 17:05 davidlehn

Thanks for the hints! I'm running the JSON-LD framing in Node via the jsonld-cli.

The data is unfortunately internal and thus unavailable. I can however try generating synthetic data of similar size and structure to see whether it suffers from the same problems.

jindrichmynarz avatar May 29 '18 19:05 jindrichmynarz

Another thing to try is to cut data size in half, and in half again, etc and check timing. I'm guessing it's not going to be a linear performance graph.

What is the structure of your dataset? If it's a collection of many similar small items, and just fails when the number of them is large, should be fairly easy to make similar test with algorithmically generating test data set of any size. If it's some social graph like thing, where the links are the problem, maybe harder to simulate.

davidlehn avatar May 30 '18 15:05 davidlehn

I encountered a similar framing problem where even 200kB document might be enough to have to wait for several tens of seconds. Below is example data and four frames. Processing all four of them takes about 70 seconds in Chrome on an average Core i5. I thought I was doing something wrong, but if @jindrichmynarz also thinks framing might be slow, maybe there actually is something suboptimal in the algorithm? Processing similar documents of sizes up to 150kB takes just a few seconds, maybe the problem is higher amount of interlinking in this one, but I haven't investigated that yet. data.jsonld.txt frames.txt

@jindrichmynarz, have you found any workarounds in your case?

marek-dudas avatar Aug 22 '18 18:08 marek-dudas

I haven't investigated this much more. I tried to frame the larger documents using jsonld-java, but it had similar performance problems and while I tried profiling the code, I haven't found a clear cause of the problems. I think the key question here is to what extent is the poor performance caused by size of input data and by its structure.

jindrichmynarz avatar Aug 25 '18 14:08 jindrichmynarz

Hello,

With this document :

https://gist.github.com/jblemee/41a5c8fa56fffc17896d3b58f42adf43

I got 52% of my cpu time in the function "removedependents" on the playgroung (and in my app)

screenshot from 2019-02-05 11-18-35

Here is the function :

 var removeDependents = function removeDependents(id) {
// get embed keys as a separate array to enable deleting keys in map
var ids = Object.keys(embeds);
for (var _i2 = 0; _i2 < ids.length; ++_i2) {
  var next = ids[_i2];
  if (next in embeds && types.isObject(embeds[next].parent) && embeds[next].parent['@id'] === id) {
    delete embeds[next];
    removeDependents(next);
  }
}
};

The problem is exponential. with 1/4 of the json it works. each time you add a element in the json list it's kind of double the execution time

jblemee avatar Feb 05 '19 10:02 jblemee

With a quick glance, I'm not sure if that _removeEmbed code above is exponential itself (?) but it's being called from code looping over ids, so worst case, it probably is. Hopefully it's possible to optimize.

davidlehn avatar Feb 05 '19 23:02 davidlehn

Did we just broke SoLiD ? :-)

happy-dev avatar Feb 06 '19 09:02 happy-dev

No attempt has been made to optimize the remove embed code -- so I suspect there is much that could be gained. We'd be very happy to accept a PR that improved performance.

dlongley avatar Feb 06 '19 15:02 dlongley