JSON-Patch
JSON-Patch copied to clipboard
Potential performance improvments
Hey,
Along side https://github.com/Starcounter-Jack/JSON-Patch/pull/149, I wanted to share some improvements I made on our fork. We managed to get 70x sped up in some cases.
The reasons I'm submitting this as a discussion and not as a PR as these are some breaking changes, but I figured it's worth sharing.
We use fast-json-patch to simulate immutable data in our system. We compare the old data against a newer set, and update the paths in the old data where we find a difference.
The first change we did was removing all the cloning work. In our specific use case, we found we didn't need to apply cloning because it was safe to reuse references from the new object.
The second thing was to the store paths as an array of strings instead of a slash string. I realize the RFC states it's stored as a string, but using compare & apply meant a lot of string concat + string splitting that was unnecessary.
The third step we added was a 'prefilter'. When walking down the tree, we called into a function that let use stop early on. We had some specific cases where we knew what paths should change and which paths were safe to ignore.
Wow, thanks for sharing your ideas! 70x speed up sounds impressive!
We will get back to longer discussion once @alshakero is back from vacation, so we could involve entire team.
- Removing cloning sounds like a heavy performance improvement, but it may also break a lot. We will have to review all our use-cases and consult it with our clients and users. Maybe we could implement is behind some flag.
- I really like this one! Even though RFC uses strings and we have to support string on I/O, using array inside our implementation and as an optional alternative for I/O sound like a neat solution.
- I'm not sure I get you correctly. Is that just an optional callback for filtering just to ignore some operations? We use similar thing (
ignoreAddregexp) in Palindrom (former PuppetJs) if we could add such feature in inexpensive manner (for regular uses), I'd say we should do that.
//cc @warpech
Thanks for your will to contribute! As long as all tests are passing, and there is no degradation in the benchmark (CONTRIBUTING.md, CTRL+F bench), we should strongly consider it.
@tomalec @warpech Are there any of these you would want prioritized? Otherwise I'll probably implement the no-clone flag
I have no personal preference, maybe just start with simplest or the one making the biggest impact. //cc @alshakero
Terribly sorry for the late response. I suggest holding your horses. Because we'll be releasing 2.0 quite soon and it's more modular. All your changes will be in one place (instead of two almost identical files like now) and I have a couple performance improvements in mind already. Waiting for that will make both our lives easier.
@KamranAsif is your fork available on GitHub?
I noticed that the _generate method is slower on version 2.x than it used to be on 1.x.
@endel I don't think that fork is available publicly. Its owned by @schrodinger, @wcjordan could open source it.
I've created my own lib for my needs: KamranAsif/evil-diff
Sure, it should be open since it's an open source license, but someone probably fat fingered making the repo private.
@KamranAsif @endel it should be public now: https://github.com/schrodinger/JSON-Patch Shouldn't have ever been private.
@wcjordan @KamranAsif @endel Thanks for Open-sourcing your repo! :)
Since it's not marked on Github as a fork, it's harder to compare this two. Is the 1.1.4 the last version at which you forked?
Here is the diff, I managed to get from Github https://github.com/schrodinger/JSON-Patch/compare/58202f4ca0a2034c1e5dcdb23f49b60c6534e914...master
I hope we can merge some of those ideas to fast-json-patch, and check with new benchmarks if, how much, and in which cases we improve.
Has this effort been abandoned? I am looking to apply this to a potentially massive data set that uses immutable objects, so I particularly need the lack of cloning and reference reuse.
To anyone seeing this: If you want structural sharing, you can use immer's applyPatches function (docs). It's not completely RFC-conformant since it uses a different path format, but they can be easily converted like this (example in typescript):
import { Operation } from "fast-json-patch";
import * as immer from 'immer';
export default function jsonPatchesToImmerPatches(jsonPatches: Operation[]): immer.Patch[] {
return jsonPatches.map(x => ({
...x,
path: jsonPathToArray(x.path)
})) as immer.Patch[];
}
function jsonPathToArray(path: string): string[] {
// The JSON Patch definition defines the path as a RFC6901 JSON Path [1].
// The JSON Path definition contains some escaping rules that we have to
// follow [2].
// [1]: https://datatracker.ietf.org/doc/html/rfc6902#section-4
// [2]: https://datatracker.ietf.org/doc/html/rfc6901#section-3
// remove the leading slash, split at the others
const parts = path.replace(/^\//, "").split("/");
// replace escaped characters (~1 and ~0)
const unescaped = parts.map(x => x.replaceAll("~1", "/").replaceAll("~0", "~"));
return unescaped;
}
Immer definitely uses structural sharing and thus does not clone objects that didn't change. You could probably also use immer.produce instead, but that likely incurs much more overhead.
@tomalec @warpech @alshakero I'm looking to revive this issue. From what the discussion looks like so far, I'm looking to implement:
- A "prefilter" (ignorePath might be a better name?) callback for the compare operation that enables consumers to opt out certain paths in the comparison.
export declare function compare(tree1: Object | Array<any>, tree2: Object | Array<any>, invertible?: boolean, prefilter?: (path: string, key: string) => boolean): Operation[];
Our use case for this is generating patches from partial versions of the original object:
const oldObj = { a: 1, b: 2, c: 3 };
const partialNewObj: { b: 4 };
compare(oldObj, partialNewObj, undefined, (path) => path !== '/b'); // only replace b, don't remove a and c
- A flag that consumers can use to opt out of deep-cloning of objects when generating patches.
Does that look good to folks? I can work on getting a couple PRs out soon.