diff icon indicating copy to clipboard operation
diff copied to clipboard

Diffing array is very inefficient if deletion/addition occurs anywhere but at the end of the array

Open tnrich opened this issue 9 years ago • 12 comments

Often an array will have an element inserted or deleted from some index within the array. As it stands, deepDiff seems to treat such an event as an insertion or deletion of the final member of the array, and an edit of all the other members. Example:

When a "c" is inserted into the first array: var a = [h, a, p, p, y] var b = [c, h, a, p, p, y]

diff(a,b) will output a bunch of edited values for the array values 0-4 and say that the y has been inserted.

Ideally, for memory savings at least, we could use some sort of fuzzy alignment of the objects being held in the two arrays to find the best match-up of indices and then have a way of notating insertions and deletions.

Do you think this is a feature you could see building in the future? Would your diff system be amenable to something like this or would it break too many aspects of it?

Also note I would be interested in helping build any such system.

Thanks for you time!

tnrich avatar Jan 28 '15 22:01 tnrich

The short answer is that I have been unsatisfied with my initial, simplistic approach to array difference. I would love for it to be more robust.

There are consequences to deeper analysis though. Namely, it will degrade performance somewhat due to a multitude of new comparisons on arrays and their items.

I would love to see some code nonetheless because the current implementation is too weak. In the end, if the performance is poor when figuring out array insertions, moves, etc. we can make the array analysis an optional feature.

flitbit avatar Jan 29 '15 16:01 flitbit

You could use something like Levenshtein distance for this.

nolanlawson avatar Feb 01 '15 05:02 nolanlawson

jsondiffpatch does this with an LCS algorithm

tmcw avatar May 06 '15 15:05 tmcw

@tnrich While it might be memory inefficient, it does mean that the actual running time of diff is faster. The resulting output might make your program run slower, of course, if processing each change record is processing intensive. The alternative is to have a slower differencing algorithm that takes multiple paths and chooses the one that produces the fewest number of changes. This is potentially much slower, tho I have no idea how much.

For me tho, that's what I want, so I'll probably end up building it. If I do I'll report back here.

fresheneesz avatar Jul 02 '15 01:07 fresheneesz

So I created a differencing module that solves this issue: https://github.com/Tixit/odiff

fresheneesz avatar Jul 06 '15 20:07 fresheneesz

Hey @flitbit & @tnrich - I created a benchmark suite to test multiple JSON diff frameworks head-to-head: https://github.com/justsml/json-diff-performance

It's coverage is hardly complete, but it's pretty easy to extend - main goals were testing ops/sec and output string byte length.

@fresheneesz I also recently added support for your odiff library!

Hope this helps in performance tuning and comparing various library techniques.

justsml avatar Apr 05 '16 14:04 justsml

@justsml Very cool!

fresheneesz avatar Apr 05 '16 22:04 fresheneesz

Yo @justsml thanks for a motivation to finally optimize this thing! Time to turbocharge the MVP... I'll report back soon.

flitbit avatar May 08 '16 17:05 flitbit

Any updates?

waieez avatar Apr 14 '17 03:04 waieez

Very cool lib, great work, but this limitation with array comparisons is forcing me to look elsewhere. I don't have the technical prowess to offer a solution, but just adding my two cents that it'd be great to have.

mikechabot avatar Jun 26 '17 20:06 mikechabot

@mikechabot I agree with you. I also have this issue now and need to look into other libraries - only because of this issue.

pschaub avatar Jan 10 '18 13:01 pschaub

@pSchaub FWIW, this is working great for me: https://github.com/benjamine/jsondiffpatch. You will need to write your own algo to consume the delta report, but his format is well documented:

https://github.com/benjamine/jsondiffpatch/blob/master/docs/deltas.md

mikechabot avatar Jan 10 '18 15:01 mikechabot