curator icon indicating copy to clipboard operation
curator copied to clipboard

[CURATOR-330] Need a way to handle connection lost while entering double barrier

Open jira-importer opened this issue 9 years ago • 0 comments

Here is the problem I’m meeting:

Assuming 3 node ensemble, my application has 3 clients and each one runs on same zk node (Client 1, 2 and 3). They use double barrier for coordination.

Client 1 is entering the barrier and waiting for the other 2. Now the other 2 nodes are down and then the ensemble gets crashed and the client 1 gets LostConnectionException from enter(). That’s expected.

After while the other 2 nodes come back, all clients need to retry operation and reenter the same barrier (It might become more complex if creating a new barrier). Here is the problem:

If the session for client 1 is still alive, Client 1 calling enter method will get NodeExistException as the ephemeral node corresponding to that session is not deleted yet.

I wonder in this case what should I do from application side? Or I’m thinking can we add a mechanism to reenter the barrier but skip creating child node for this client if that exists?

Thanks,
Simon


Originally reported by [email protected], imported from: Need a way to handle connection lost while entering double barrier
  • status: Open
  • priority: Major
  • resolution: Unresolved
  • imported: 2025-01-21

jira-importer avatar May 24 '16 20:05 jira-importer