gosling.js icon indicating copy to clipboard operation
gosling.js copied to clipboard

Improve performance of json-type by removing deep-clones of spec/data

Open puehringer opened this issue 1 year ago • 4 comments

When inlining long arrays (>10k objects) as values in a spec via the type: json, zooming and panning becomes painfully slow. We identified the key reason for this to be the deep-cloning/structuredClone of the spec which happens in several places. See this (draft) PR for the places where we removed the deep-cloning, without any breaking tests/demos: https://github.com/datavisyn/gosling.js/pull/1

We know that the main point of using higlass is to avoid plotting that many points, however the performance improvements are so drastic that refactoring towards less deep-cloning operations should still be considered.

It should also be noted that the sampleLength property does not have any influence on this, because the bottleneck really is the cloning of the entire values array, causing gosling to be slow even with sampleLength: 100.

Slow demo (v0.9.25)

slow_demo

(ignore the title, it's the slow version)

Fast demo (without the deep-cloning, using the built version of https://github.com/datavisyn/gosling.js/pull/1)

fast_demo

Open questions:

  • [ ] see the ones in https://github.com/datavisyn/gosling.js/pull/1

puehringer avatar Oct 20 '22 07:10 puehringer

Thank you @puehringer for exploring this! The performance improvement is really interesting to see.

The only reason I was using the deep copy frequently was to safely keep the original (or intermediate) specs. For example, this.specComplete copies the original spec so that any values added to this later (e.g., missing values like domain) are not added to the original spec.

https://github.com/gosling-lang/gosling.js/blob/7edb6c0bc31e2245de091f342e8ffb4b07dd4427/src/core/gosling-track-model.ts#L84

https://github.com/gosling-lang/gosling.js/blob/7edb6c0bc31e2245de091f342e8ffb4b07dd4427/src/core/gosling-track-model.ts#L97

Similarly, whenever the Gosling API exposes the spec to users, I am deep cloning before sending it to users so that whenever users edit the spec, it does not affect to the spec used inside the GoslingComponent.

publish(eventType, {
    id: context.viewUid,
    spec: structuredClone(this.options.spec), // instead of spec: `spec: this.options.spec,`
    shape: { x, y, width, height }
});

...
const data = gosRef.current.api.getTrackInfo('id');
data.spec.tracks = []; // does not affect the spec inside GoslingComponent.

But, I did not consider large JSON values sufficiently at the time of implementing this, and perhaps, some cloning can be removed.

Ultimately, if we want to consider large JSON values, we could keep the values outside of the spec after compiling the spec (e.g., removing values fields from the spec and keeping it in the gosling–track-model separately) so that cloning spec afterward does not decrease the performance.

sehilyi avatar Oct 29 '22 23:10 sehilyi

(Partly related to https://github.com/gosling-lang/gosling.js/issues/508)

sehilyi avatar Oct 30 '22 01:10 sehilyi

@sehilyi I worked on a more concrete example (i.e. with inlined JSON) which can also be used in your editor. It is a simple alignment vis with inlined data for multiple species. This should help reproducing the issues: json_demo.json. I think this is a pretty good benchmark, as it should be possible to make this visualization super smooth. Let me know if we can help with anything, I think https://github.com/datavisyn/gosling.js/pull/1 is a starting point. I also get your points of passing copies around for immutability purposes, and I agree that it should improve performance drastically if spec and (inline json) data would be kept separate.

And here is a quick and easy way to check what is being cloned (it just overrides JSON.stringify, but there is also structuredClone calls). The demo shows how often it is called when zooming/panning, and including the copying of the (large) original data array.

const orig = JSON.stringify;
JSON.stringify = (...args) => {
    console.log("STRINGIFY", args)
    return orig(...args);
}

stringify.webm

puehringer avatar Oct 27 '23 08:10 puehringer

Hi @puehringer, thanks for the comment and for looking into this more in detail. The json example you provided will be very helpful. I've started to set up performance testing infrastructure and will use this as my first example to optimize.

etowahadams avatar Oct 27 '23 16:10 etowahadams