Evaluate usage of tracker clock to ensure it's logically correct
To recap, the tracker uses an implementation of an Interval Tree Clock for resolving the causal history of replicated events. My understanding of ITC is that a scenario plays out like so:
- Node A is created as a seed clock
- Node B is created by forking A's clock
- Event X occurs on A, incrementing it's event clock
- When A syncs with B or vice versa, we can compare the clocks, and see that B's clock is strictly in the past of A's clock, so we can update B's state no problem
- If events occur on both A and B, their clocks will conflict, which requires us to resolve the conflicting state in some way, and then make sure the clocks are synchronized before proceeding
In the case of Swarm, we can deterministically resolve conflicts because at any given point in time, we know which node in the cluster "owns" a given process/registration, with the exception of processes which are registered but not managed by Swarm. We can ask the conflicted process to shut down, or kill it, and the registration will remain with the "correct" version. However, since the way synchronization works today is that the registry is sent to the remote node, I'm not sure a clock is strictly needed. If we assume that it does help us though, we should ensure that our logic for handling the clock forking/peeking/incrementing is correct.
I have noticed some failures in swarm when ITCs are compared. I realized that the Clock.peek() operation is broken, returning an invalid structure which is not compatible with the other operations. Which means that peeked clocks cannot be compared or joined. I put up a pull request with a fix: https://github.com/bitwalker/swarm/pull/77
While I was investigating this problem, I reviewed the general use of ITCs and it seems like the current implementation is flawed.
Here are a few rules which have to be implemented to make ITCs work:
- Never store a remote clock in the local registry. Remote clocks have no identity (or the wrong identity). There are a many places which store remote clocks in the local registry. Which should be always wrong. Because remote clocks either have no identity (if peeked) or the wrong identity of another node. e.g. https://github.com/bitwalker/swarm/blob/e88966cad465477eb935b96c83e13c080b1a138b/lib/swarm/tracker/tracker.ex#L889
Also peeked clocks should never be stored because otherwise the identity gets lost: e.g. https://github.com/bitwalker/swarm/blob/e88966cad465477eb935b96c83e13c080b1a138b/lib/swarm/tracker/tracker.ex#L654
-
Use Clock.join() to receive causal information from remote (peeked) clocks. Clock.join() is not used anywhere in the Tracker, which means that causal information from other nodes is never updated after a broadcast. In order to use Clock.join() with peeked clocks, the ITC implementation has to be fixed first.
-
Always call Clock.peek() before sending clocks to other replicas. This avoids a couple bugs when the local node stores the wrong clock identity.
-
Make sure that every node gets a properly forked clock when joining the cluster. This is actually the most complicated piece to fix.
Let's take a look at an example (handle_replica_event for :update_meta):
- It should not merge the metadata when the remote clock wins, otherwise remove_meta calls might get lost: https://github.com/bitwalker/swarm/blob/e88966cad465477eb935b96c83e13c080b1a138b/lib/swarm/tracker/tracker.ex#L888
- It should not take any data when the local clock wins: https://github.com/bitwalker/swarm/blob/e88966cad465477eb935b96c83e13c080b1a138b/lib/swarm/tracker/tracker.ex#L892
- It should not store the remote clock in the local registry: https://github.com/bitwalker/swarm/blob/e88966cad465477eb935b96c83e13c080b1a138b/lib/swarm/tracker/tracker.ex#L889
I think, the code should look like this:
Clock.leq(lclock, rclock) ->
# Join the peeked remote clock to take over causal information from the other node
nclock = Clock.join(lclock, rclock)
# Update the metadata and clock
Registry.update(name, [meta: new_meta, clock: nclock])
Clock.leq(rclock, lclock) ->
# This node has data which is causally dominating the remote
# Ignore the request when the local clock wins
:else ->
# we're going to take the last-writer wins approach for resolution for now
new_meta = Map.merge(old_meta, new_meta)
nclock = Clock.join(lclock, rclock)
nclock = Clock.event(nclock)
# we're going to join and bump our local clock though and re-broadcast the update to ensure we converge
Registry.update(name, [meta: new_meta, clock: nclock])
debug "conflicting meta for #{inspect name}, updating and notifying other nodes"
broadcast_event(state.nodes, Clock.peek(nclock), {:update_meta, new_meta, pid})
I found this issue because there is one process in our swarm of 3 nodes that does not start and we get this warning:
09:11:40.932 [warn] [swarm on node1@49800a15e1cc] [tracker:handle_replica_event] received track event for "device-1234", mismatched pids, local clock conflicts with remote clock, event unhandled
We have 3 nodes running, but this warning only occurs on one (49800a15e1cc) of them.
Strange is that Swarm.whereis_name does always return the same PID where the according process is not running (anymore).
Is this a problem that will be fixed by swarm? Is there a workaround for this?
I've got some pending work which addresses the usages of clocks inside Swarm, but I'm currently in the middle of a big move, so I've had to shelve stuff for a bit until I get settled in my new house - this is definitely on my list of priorities!
This issue is starting to occur more frequently at our system. Hope it can be fixed with the next release so we can keep using this nice package :)
We're running into the same warning/error as @h4cc.
I'm currently working on a new causally consistent model for Swarm, but in the meantime I believe we recently merged a PR which included fixes to the clock usage, you may want to give that a shot until I can get a release out.