copycat icon indicating copy to clipboard operation
copycat copied to clipboard

Cluster memebers are tracked by their Address' hashCode

Open bgloeckle opened this issue 8 years ago • 2 comments

ClusterState seems to track members of the cluster by their "ID" and the ID seems to be (LeaderState#join(...)) the Object#hashCode() of the address of that node.

This seems to be insecure, as two nodes with different addresses could have the same hashCode and therefore the same ID. Or am I missing something?

bgloeckle avatar Dec 01 '15 10:12 bgloeckle

No, you're right. The hashCode thing is a trade off. It's exceedingly unlikely Address hash codes will collide since IPs are limited to a specific, comparatively small range of possibilities, and the number of servers in a cluster is also small. Even just using the string hash code makes it unlikely they'll ever collide within a single cluster.

The reason for using hash codes is to provide a space compact way to send Address in requests like AppendRequest. It's effectively a member ID. AppendRequest.leader() is a 4-byte ID rather than a 22 byte host and port. This is just a user friendly alternative to requiring the user to specify a server ID at startup.

But I think with the changes in the client transport PR (which will be merged soon) this can probably be replaced. That PR adds an extra request type for sending configurations before sending an AppendRequest. So, the leader can arbitrarily assign server IDs in its own configuration and send those IDs to followers when replicating that configuration. This wasn't possible in the past since AppendRequest was the only way the leader replicated its configuration. This would mean essentially the leader() ID sent in an AppendRequest relates to the current cluster configuration, which should be fine. Some parts of the protocol like pre-vote and vote requests would still have to serialize Address since they happen before any leader exists, but those are infrequent requests so that should be fine too.

On Dec 1, 2015, at 2:34 AM, Bastian Glöckle [email protected] wrote:

ClusterState seems to track members of the cluster by their "ID" and the ID seems to be (LeaderState#join(...)) the Object#hashCode() of the address of that node.

This seems to be insecure, as two nodes with different addresses could have the same hashCode and therefore the same ID. Or am I missing something?

— Reply to this email directly or view it on GitHub.

kuujo avatar Dec 01 '15 16:12 kuujo

Sounds great :)

bgloeckle avatar Dec 02 '15 10:12 bgloeckle