go-ds-crdt icon indicating copy to clipboard operation
go-ds-crdt copied to clipboard

Non-Converging Operations

Open dozyio opened this issue 1 year ago • 9 comments

Hi,

I'm currently working on a JS port and encountered an edge case during interop testing where the values do not converge as expected, despite the DAGs being identical.

When performing a sequence of put, put, delete operations on replica0, and a single put on replica1 (all on the same key), replica0 reverts to the second put operation, while replica1 retains its initial put. This causes the replicas to hold different values even though their DAGs are the same.

Any ideas?

Test case below

func TestCRDTPutPutDelete(t *testing.T) {
	replicas, closeReplicas := makeNReplicas(t, 2, nil)
	defer closeReplicas()

	br0 := replicas[0].broadcaster.(*mockBroadcaster)
	br0.dropProb = 101

	br1 := replicas[1].broadcaster.(*mockBroadcaster)
	br1.dropProb = 101

	k := ds.NewKey("k1")

	// r0 - put put delete
	err := replicas[0].Put(context.Background(), k, []byte("r0-1"))
	if err != nil {
		t.Fatal(err)
	}
	err = replicas[0].Put(context.Background(), k, []byte("r0-2"))
	if err != nil {
		t.Fatal(err)
	}
	err = replicas[0].Delete(context.Background(), k)
	if err != nil {
		t.Fatal(err)
	}

	// r1 - put
	err = replicas[1].Put(context.Background(), k, []byte("r1-1"))
	if err != nil {
		t.Fatal(err)
	}

	br0.dropProb = 0
	br1.dropProb = 0

	time.Sleep(15 * time.Second)

	r0Res, err := replicas[0].Get(context.Background(), ds.NewKey("k1"))
	if err != nil {
		if !errors.Is(err, ds.ErrNotFound) {
			t.Fatal(err)
		}
	}

	r1Res, err := replicas[1].Get(context.Background(), ds.NewKey("k1"))
	if err != nil {
		t.Fatal(err)
	}

	fmt.Printf("r0Res: %s\nr1Res: %s\n", string(r0Res), string(r1Res))
	if string(r0Res) != string(r1Res) {
		t.Log("r0 dag")
		replicas[0].PrintDAG()

		t.Log("r1 dag")
		replicas[1].PrintDAG()

		t.Fatal("r0 and r1 should have the same value")
	}
}
go test -test.count 1 -v -run TestCRDTPutPutDelete ./...
?   	github.com/ipfs/go-ds-crdt/pb	[no test files]
=== RUN   TestCRDTPutPutDelete
r0Res: r0-2
r1Res: r1-1
    crdt_test.go:536: r0 dag
- 1 | x3nq: Add: {/k1:r1-1,}. Rmv: {}. Links: {}:
- 3 | rn6q: Add: {}. Rmv: {/k1,/k1,}. Links: {obqq,}:
 - 2 | obqq: Add: {/k1:r0-2,}. Rmv: {}. Links: {43z4,}:
  - 1 | 43z4: Add: {/k1:r0-1,}. Rmv: {}. Links: {}:
    crdt_test.go:539: r1 dag
- 1 | x3nq: Add: {/k1:r1-1,}. Rmv: {}. Links: {}:
- 3 | rn6q: Add: {}. Rmv: {/k1,/k1,}. Links: {obqq,}:
 - 2 | obqq: Add: {/k1:r0-2,}. Rmv: {}. Links: {43z4,}:
  - 1 | 43z4: Add: {/k1:r0-1,}. Rmv: {}. Links: {}:
    crdt_test.go:542: r0 and r1 should have the same value
--- FAIL: TestCRDTPutPutDelete (30.00s)

dozyio avatar Sep 01 '24 10:09 dozyio

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

welcome[bot] avatar Sep 01 '24 10:09 welcome[bot]

@dozyio I see you are writing in JS, which protobuf library are you using? I am not familiar with this project, but in GO/JS interop of other IPFS projects we've seen discrepancy in sorting/serialization due to JS library producing different bytes.

Things you may double-check what happens on removal (seen them being different across libraries):

  • are remaining Keys/fields sorted in the same order
  • what happens when a field like Links is empty (is it skipped, or present but empty)

FYSA Helia uses https://www.npmjs.com/package/protons

lidel avatar Sep 03 '24 16:09 lidel

I can reproduce this... but it seems to work when dropProb = 0 at the beginning.

I need to look further, but it might be that PrintDAG() is just printing blocks that have not been processed by the replicas (seems tests use a common blockstore?).. So it prints the same DAG because the DAGSyncer does not sync anything really as it is a single underlying blockstore. Instead perhaps some entries in that blockstore were not broadcasted.

Why not? I'm not sure but it can give you a lead to look into, otherwise I will look again when I have some time. There is a processedBlocks namespace and perhaps printDAG should check against that list and print info on whether a block is processed or not.

hsanjuan avatar Sep 03 '24 23:09 hsanjuan

Thanks @lidel @hsanjuan

That gives me something to work with - will try debugging over the next few days

dozyio avatar Sep 04 '24 08:09 dozyio

This is a bug! Good find @dozyio . I'm trying to come up with a fix.

hsanjuan avatar Sep 05 '24 20:09 hsanjuan

@hsanjuan

I've had a few hours to try and debug this but it starts getting quite messy and no luck as yet. From my limited understanding of AWOR sets I think the converged key should be deleted as it has the highest priority - is that correct?

dozyio avatar Sep 05 '24 21:09 dozyio

The problem is that while r0 has tombstoned both writes, when the x3nq update arrives (it is the last), it does not realize that the current-value (r0-2, priority 3) has been tombstoned and therefore (r1-1, priority 1) is actually the value that needs to take effect.

replica 1 writes r1-1 first and then it never replaces it, as it processes the other updates later.

hsanjuan avatar Sep 05 '24 21:09 hsanjuan

The fix needs to touch the value/priority entries, either when putting Tombstones (to leave the correct value/priority) or when putting new values (to check that the current value/priority has not been tombstoned). I need to sleep on the approach...

hsanjuan avatar Sep 05 '24 21:09 hsanjuan

Thanks for looking into it!

dozyio avatar Sep 05 '24 21:09 dozyio