jsonpatch icon indicating copy to clipboard operation
jsonpatch copied to clipboard

Option to use unique identifier when diffing lists

Open Valian opened this issue 1 year ago • 7 comments

Is your feature request related to a problem? Please describe.

Currently items in lists are compared by index.

In some cases, list patches might be much smaller if we would use unique identifier present in items to distinguish if they were moved around. For example:

original = [
  %{id: 1, name: "test1"},
  %{id: 2, name: "test2"},
  %{id: 3, name: "test3"},
  %{id: 4, name: "test4"},
  %{id: 5, name: "test5"},
  %{id: 6, name: "test6"},
]

updated = List.insert_at(original, 0, %{id: 123, name: "test123"})

Jsonpatch.diff(original, updated)

Output:

[
  %{value: %{id: 6, name: "test6"}, path: "/6", op: "add"},
  %{value: "test5", path: "/5/name", op: "replace"},
  %{value: 5, path: "/5/id", op: "replace"},
  %{value: "test4", path: "/4/name", op: "replace"},
  %{value: 4, path: "/4/id", op: "replace"},
  %{value: "test3", path: "/3/name", op: "replace"},
  %{value: 3, path: "/3/id", op: "replace"},
  %{value: "test2", path: "/2/name", op: "replace"},
  %{value: 2, path: "/2/id", op: "replace"},
  %{value: "test1", path: "/1/name", op: "replace"},
  %{value: 1, path: "/1/id", op: "replace"},
  %{value: "test123", path: "/0/name", op: "replace"},
  %{value: 123, path: "/0/id", op: "replace"}
]

Describe the solution you'd like I'd love to have additional option to diff being a function returning unique identifier for a given object &object_hash/1 that would trigger different diffing mechanism for lists.

Output I'd like to get for the previous example:

Jsonpatch.diff(original, updated, object_hash: fn %{id: id} -> id end)

[
  %{value: %{id: 123, name: "test123"}, path: "/0", op: "add"}
]

Describe alternatives you've considered Some JS libraries support such option, eg jsondiffpatch

Additional context I'm planning to use json_patch in LiveVue to reduce payload size. Already playing with it. It's very common that items in lists have unique identifiers (eg. id) that should make it possible to create much more optimized patches.

Valian avatar Sep 04 '24 11:09 Valian

Hi @Valian

I checkout jsondiffpatch. It sounds like a neat feature. I will plan it for version 2.3.

corka149 avatar Sep 07 '24 18:09 corka149

I wonder how much work it would take for CursorAI to implement 😅

I may try to give it a quick shot and see.

Valian avatar Sep 07 '24 19:09 Valian

My time is a heavily limited but the feature is not forgotten. :laughing:

corka149 avatar Sep 21 '24 18:09 corka149

I've tried working on this, but it's not trivial. We have to generate removals, additions, and then adjust indexes of all elements and see if any "moves" are necessary.

Well by saying "tried working on this" I meant having a lengthy discussion with CursorAI and getting some starting implementation and tests, but sadly it's not really up to the standard I'd like it to be. And still has bugs 😅

Valian avatar Sep 27 '24 12:09 Valian

In my head it would require to create a full hash-to-index map for source and then use it while checking the target only for moves. All detected moves will be ignored in the next step where the regular diff algorithm will run to detect changes, additions and deletions. But so far I had no chance to test it. 😕

corka149 avatar Sep 27 '24 13:09 corka149

Hmm... would it work for my initial example?

Eg. element with id 1 moved from index: 0 to index: 1 but that was not a real move, rather a result of inserting another item at index 0.

So personally I'm thinking about getting that full hash-to-index map, finding all additions and removals, and then updating the indexes of that map so we can find "real" moves (that are not results of additions / removals).

So in the previous example, after discovering insertion at index 0 we would increase indexes of all source elements which are >= 0 to accomodate for that. And then checking for moves will properly give us nothing.

Valian avatar Sep 27 '24 14:09 Valian

Oh, yes. The expected new index needs to be calculated for all subsequent items.

corka149 avatar Sep 29 '24 17:09 corka149

Still not forgotten. 😄

I am thinking back and forth about this. It seems not to be a standard feature for a library that create diffs (inspiration of this lib jsonpatch.make_patch) and I see why.

It is not easy to say when it makes sense or not. When someone already know what matters, then they could create manually pretty simple the patch. The current implementation might be "dump" but it is a robust one.

corka149 avatar Nov 09 '24 19:11 corka149

At the moment it doesnt fit into this library. But I am open for good PRs.

corka149 avatar Dec 07 '24 19:12 corka149

Ok, thank you @corka149! Probably I'll try implementing this in my own fork soon and if it won't match your vision of jsonpatch I'll either publish it as a new lib or pull it directly into LiveVue :+1:

At the moment it doesnt fit into this library

Just last question, why? In my opinion it might allow to create much smaller diffs for certain cases, and it can't be done in a different place than in the library creating the diff.

Valian avatar Dec 07 '24 21:12 Valian

It is somehow not versatile and almost very specific to an use case. Yes, this library allows creating the diffs but it also allows to create patches "manually".

Another blocker for me is to distinguish when the object_hash function shall be used or not when one is provided. Maybe an data structure has several list. The object_hash function could check them all. But is this intended?

corka149 avatar Dec 07 '24 22:12 corka149

Yes, I believe if it's provided it should be applied to them all. It might be possible to implement a way out - If function returns nil then we could use normal diffing mechanism. Then it might be as easy as:

Jsonpatch.diff(original, updated, object_hash: fn 
  %MyStruct{name: name} -> name
  %{id: id} -> id 
  _ -> nil
end)

or object_hash might accept also 2nd argument being path of a list, so it could be applied selectively.

I fully understand if it might be too much for that library 😅 Just it would be very helpful to my use case.

Valian avatar Dec 07 '24 22:12 Valian

When your root object is the list, and always will be, it will be much simpler to implement it for that case then adding it as a generic solution. At least that is my expectation 😂

Feel free to open an issue when you need help. 🙏

corka149 avatar Dec 08 '24 12:12 corka149

Well I don't have any guarantees of it being a list. I'm basically trying to reduce diff of sending live view assigns rendered as json. So in essence it's an object with any values inside. Let me play with it and see!

Valian avatar Dec 08 '24 20:12 Valian