bdr Adding documentation regarding the importance of a single time source…

trafficstars

… on conflict handlers

see https://github.com/2ndQuadrant/bdr/issues/109

Jul 10 '15 10:07 gilesw

Hi,

On 2015-07-10 03:14:16 -0700, gilesw wrote:

Adding documentation regarding the importance of a single time source on conflict handlers

If you have a conflict when the time on two nodes is out of sync the conflict
may never be able to resolved because the last update time will never match
even after the handler has run. This will manifest itself as row updates only
syncing in one direction.

That shouldn't actually happening - this should result in the "wrong row" winning, but it should nevertheless be resolved.

Jul 10 '15 10:07 anarazel

Hi anarazel,

I've corrected the time source now but the steps I used to create the conflict were:-

For an update/update conflict I powered down node a and updated on node b. Then powered down node b and updated node a and powered node b back on.

conflict_id              | 860
local_node_sysid         | 6166345561721046825
local_conflict_xid       | 4990
local_conflict_lsn       | 0/CA06DD98
local_conflict_time      | 2015-07-08 16:30:29.276713+00
object_schema            | public
object_name              | table
remote_node_sysid        | 6166334043667378995
remote_txid              | 3114
remote_commit_time       | 2015-07-08 15:45:44.999168+00
remote_commit_lsn        | 1/4104FF98
conflict_type            | update_update
conflict_resolution      | last_update_wins_keep_local
local_tuple              | {"table_id":1452776,"last_update_id":"xxx","password":"obs","username":"final7","acc_id":1,"last_update_time":"2015-07-08T16:16:34.854137+00:00","make_public":false}
remote_tuple             | {"table_id":1452776,"last_update_id":"xxx","password":"obs","username":"final8","acc_id":1,"last_update_time":"2015-07-08T15:45:44.994954+00:00","make_public":false}
local_tuple_xmin         | 4988

It did have me stumped for a good while which is why I submitted the issue for the doc update. As soon as the time source was corrected though the syncing was bi-directional again.

I did try to do some more diagnosis by clearing out the conflict history to try and log each step but I got into an infinite conflict loop. If you delete the conflict history on each node you actually generated a delete/delete conflict. Do you want me to submit this as a bug or is there a purge function that I'm missing?

Jul 10 '15 10:07 gilesw

If you delete the conflict history on each node you actually generated a delete/delete conflict

Hm. We don't replicate inserts into the conflict history table from the conflict tracking code, but maybe we don't filter out subsequent SQL-level update/delete on the table? I'll need to check. Creating new bug.

Mar 18 '16 02:03 ringerc

It sounds like we need to reproduce this and fix the underlying bug with desynchronized time causing failure to resolve.

@gilesw Can you supply a more detailed set of steps to reproduce this? BDR setup commands, DDL, and the SQL run on each node to create the issue?

Mar 18 '16 02:03 ringerc

bdr bdr copied to clipboard

Adding documentation regarding the importance of a single time source…

bdr
bdr copied to clipboard