mongodb-container icon indicating copy to clipboard operation
mongodb-container copied to clipboard

First replica set member is not adding to existing replica set after failure

Open lehmeyer opened this issue 6 years ago • 4 comments

Because of the hard coded memberid (0), after an error of the first replica set, the POD cannot add itself to an existing replica set of the other replicas (1+). The first POD then initializes a new replica set.

# Initialize replica set only if we're the first member
if [ "${MEMBER_ID}" = '0' ]; then
  initiate "${MEMBER_HOST}"
else
  add_member "${MEMBER_HOST}"
fi

The error can be simulated if the first POD and the corresponding PVC is deleted. It would be better to determine beforehand whether a replica set already exists.

lehmeyer avatar Apr 03 '19 12:04 lehmeyer

The pull request #305 solves the problem.

lehmeyer avatar Apr 05 '19 07:04 lehmeyer

The error can be simulated if the first POD and the corresponding PVC is deleted.

@lehmeyer Understand that deleting PVC will cause the issue. But did you have the problem with restarting primary in production usage? (without manually deleting a volume)

Kubernetes should remount the volume to restarted pod, so mongod in container should successfully reconnect to replicaset.

omron93 avatar Apr 17 '19 07:04 omron93

The error can be simulated if the first POD and the corresponding PVC is deleted.

@lehmeyer Understand that deleting PVC will cause the issue. But did you have the problem with restarting primary in production usage? (without manually deleting a volume)

Kubernetes should remount the volume to restarted pod, so mongod in container should successfully reconnect to replicaset.

The problem exists if the data of the first replicaset (memberid = 0) no longer exists. Then a splitbrain situation arises with two independent replicasets that cannot be connected.

lehmeyer avatar Jul 26 '19 12:07 lehmeyer

The problem exists if the data of the first replicaset (memberid = 0) no longer exists.

Agree that it might be an issue - although I don't know how this situation can happen.

But I think it's quite edge case - and it might be a bug in Kubernetes. If it's worth fixing, then #305 looks good.

omron93 avatar Sep 20 '19 12:09 omron93

mongodb container is not maintained any more in this org. closing.

hhorak avatar Apr 10 '24 11:04 hhorak