etcd icon indicating copy to clipboard operation
etcd copied to clipboard

Enhance `--force-new-cluster` to support rebuilding single-member cluster from a removed member

Open didihongsheng opened this issue 3 years ago • 17 comments

What happened?

in a three member cluster, I remove one ,and then I start it by set --force-new-cluster=true, the server panic as below:

`{"level":"info","ts":"2022-03-29T02:27:21.410Z","caller":"rafthttp/pipeline.go:85","msg":"stopped HTTP pipelining with remote peer","local-member-id":"13b41e9f910ab835","remote-peer-id":"14108607b609f3fa"} {"level":"info","ts":"2022-03-29T02:27:21.410Z","caller":"rafthttp/stream.go:442","msg":"stopped stream reader with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"13b41e9f910ab835","remote-peer-id":"14108607b609f3fa"} {"level":"info","ts":"2022-03-29T02:27:21.410Z","caller":"rafthttp/stream.go:442","msg":"stopped stream reader with remote peer","stream-reader-type":"stream Message","local-member-id":"13b41e9f910ab835","remote-peer-id":"14108607b609f3fa"} {"level":"info","ts":"2022-03-29T02:27:21.410Z","caller":"rafthttp/peer.go:335","msg":"stopped remote peer","remote-peer-id":"14108607b609f3fa"} {"level":"info","ts":"2022-03-29T02:27:21.410Z","caller":"rafthttp/transport.go:355","msg":"removed remote peer","local-member-id":"13b41e9f910ab835","removed-remote-peer-id":"14108607b609f3fa"} panic: removed all voters

goroutine 187 [running]: go.etcd.io/etcd/raft/v3.(*raft).applyConfChange(0xc0005009a0, 0x0, 0xc00041cea0, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /tmp/etcd-release-3.5.0/etcd/release/etcd/raft/raft.go:1633 +0x21a go.etcd.io/etcd/raft/v3.(*node).run(0xc000114180) /tmp/etcd-release-3.5.0/etcd/release/etcd/raft/node.go:360 +0x856 created by go.etcd.io/etcd/raft/v3.RestartNode /tmp/etcd-release-3.5.0/etcd/release/etcd/raft/node.go:244 +0x330`

What did you expect to happen?

the removed member should normally start up as an one member cluster。

How can we reproduce it (as minimally and precisely as possible)?

1、create a three member etcd cluster 2、remove one of them 3、set --force-new-cluster=true to the removed member

Anything else we need to know?

No response

Etcd version (please run commands below)

etcd version 3.5.0 etcdctl version 3.5.0

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

didihongsheng avatar Mar 29 '22 02:03 didihongsheng

I was able to reproduce this on the latest release-3.5 branch.

Run example 3 node cluster goreman -f Procfile start remove one member ./bin/etcdctl member remove 8211f1d0f64f3269

start it again with command from Procfile + --force-new-cluster ./bin/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr --force-new-cluster

I'm not 100% sure this is a bug. Messaging is confusing but when node is removed

''' {"level":"warn","ts":"2022-03-30T01:34:02.687-0700","caller":"etcdserver/server.go:1146","msg":"server error","error":"the member has been permanently removed from the cluster"}

{"level":"warn","ts":"2022-03-30T01:34:02.687-0700","caller":"etcdserver/server.go:1147","msg":"data-dir used by this member must be removed"} '''

I'll take a closer look. Is there an actual production use-case for this?

/assign

lavacat avatar Mar 30 '22 08:03 lavacat

It's good story. When the member is removed from the cluster, itself ID is removed from the db, and the member will stop itself automatically.

After starting the member again with --force-new-cluster, it will remove all peers on startup. Since itself isn't in the member list as well, so there is no any voter any more. Accordingly it fails to get started, see confchange.go#L173 .

Probably we can consider to support this case in future, but for now we do not see any real requirement so far. So we can regard it as a not supported case, just as the log message mentioned "data-dir used by this member must be removed", which means we can't start the member on the data any more.

ahrtr avatar Apr 08 '22 23:04 ahrtr

I think we might also consider improving the error message so it's clearer for user that this is not a etcd error.

serathius avatar Apr 09 '22 07:04 serathius

Is there a way to manually recover from this position? Some sort of way to restore the cluster?

zawachte avatar Nov 16 '23 20:11 zawachte

I have to bring this back from the dead again, we have a case where we ended up restoring a previously deleted member.

I cobbled together a quick integration test to repro the panic: https://github.com/tjungblu/etcd/commit/ffd784ae1c862cdd01675a14ff652927776b9ca5

panic: removed all voters

goroutine 837 [running]:
go.etcd.io/raft/v3.(*raft).applyConfChange(0xc000782000, {0x0, {0xc0039cb4c0, 0x1, 0x1}, {0x0, 0x0, 0x0}})
	/home/tjungblu/.gvm/pkgsets/go1.24.4/global/pkg/mod/go.etcd.io/raft/[email protected]/raft.go:1904 +0x1cd
go.etcd.io/raft/v3.(*node).run(0xc0039e6240)
	/home/tjungblu/.gvm/pkgsets/go1.24.4/global/pkg/mod/go.etcd.io/raft/[email protected]/node.go:402 +0xa05
created by go.etcd.io/raft/v3.RestartNode in goroutine 150
	/home/tjungblu/.gvm/pkgsets/go1.24.4/global/pkg/mod/go.etcd.io/raft/[email protected]/node.go:287 +0x239

I'm not entirely sure what the best fix for this could be though - especially since this panic is in the raft repo nowadays.

cc @clobrano

tjungblu avatar Sep 04 '25 14:09 tjungblu

I've been debugging this a bit now, the reason the removed member isn't added again is because of this condition in the raft apply loop:

		if cc.NodeID == 0 {
			// etcd replaces the NodeID with zero if it decides (downstream of
			// raft) to not apply a change, so we have to have explicit code
			// here to ignore these.
			continue
		}

https://github.com/etcd-io/raft/blob/634b2e94995e90fd1d9d2878b4b2ca26b70617dc/confchange/confchange.go#L152-L157

back in etcd, this was actually caused by:

msg":"Validation on configuration change failed","shouldApplyV3":false,"error":"membership: ID removed"

https://github.com/etcd-io/etcd/blob/aeb47eed9a1d72e3eaeaee41fb808484ce225ceb/server/etcdserver/server.go#L2049-L2052

so this is kinda triggered by the ValidateConfigurationChange function that checks all previously removed members in:

https://github.com/etcd-io/etcd/blob/aeb47eed9a1d72e3eaeaee41fb808484ce225ceb/server/etcdserver/api/membership/cluster.go#L320-L322

seems it was introduced with #1623

tjungblu avatar Sep 09 '25 11:09 tjungblu

@tjungblu As @ahrtr stated in https://github.com/etcd-io/etcd/issues/13848#issuecomment-1093449532, the panic is expected and currently we don't support this case of force-new-cluster from deleted member. Can you clarify why you reopened the issue? Thanks!

siyuanfoundation avatar Nov 20 '25 18:11 siyuanfoundation

/cc @fuweid

siyuanfoundation avatar Nov 20 '25 19:11 siyuanfoundation

@tjungblu it sounds like you're trying to add back a removed member to its original cluster, rather than restarting a cluster based on only a severed member. Can you clear up what you were trying to do? Maybe add some steps? You may have a new & different bug/unimplemented feature.

jberkus avatar Nov 20 '25 19:11 jberkus

@tjungblu pls rasie a feature request Enhance '--force-new-cluster' to support rebuilding single-member cluster from a removed member, also elaborate your real use case. Once the new ticket is raised , please close this one.

ahrtr avatar Nov 20 '25 19:11 ahrtr

Just changed this ticket to a feature request.

ahrtr avatar Nov 20 '25 19:11 ahrtr

@ahrtr I think that @tjungblu actually has a different failure case from the original issue, based on the description.

jberkus avatar Nov 20 '25 22:11 jberkus

@ahrtr I think that @tjungblu actually has a different failure case from the original issue, based on the description.

Yes, I am aware of that. It's just a technical restriction to prevent reusing a removed member's data. From end user perspective, the key/value data are still good, why can't reuse it?

This feature might not be a high priority, depending the real cases or requirements.

ahrtr avatar Nov 21 '25 09:11 ahrtr

I might be able to shine a light on this scenario.

It refers to a use case where two nodes in a two-member cluster receive a shutdown request at the same me, so they both attempt to leave the cluster.

What follows is some race condition. While correctly only 1 node successfully commit its own removal, the attempted removal of the other is written in its local WAL.

When we later force a new cluster to start from this node's data, the recovery process replays the WAL and incorrectly applies this uncommitted self-removal entry, causing the node to remove itself upon startup.

clobrano avatar Nov 21 '25 15:11 clobrano

When we later force a new cluster to start from this node's data, the recovery process replays the WAL and incorrectly applies this uncommitted self-removal entry, causing the node to remove itself upon startup.

This should be impossible. Starting an etcd instance with --force-new-cluster discards previously uncommitted entries. Unless there is an unknown bug.

It refers to a use case where two nodes in a two-member cluster receive a shutdown request at the same me

It isn't clear what's the real use case. Two node receiving a shutdown request at the same time seems a little weird. Usually cluster upgrade/update should be in a controlled orchestrated way.

ahrtr avatar Nov 21 '25 15:11 ahrtr

This should be impossible. Starting an etcd instance with --force-new-cluster discards previously uncommitted entries. Unless there is an unknown bug.

Sorry for the old logs, but this is an excerpt of the startup of an etcd instance with force new cluster set. There is only 1 member at the moment and while replaying the events from WAL. It seems to be that it is removing itself.

Aug 11 21:07:34 master-0 etcd[4128]: {"level":"info","ts":"2025-08-11T21:07:34.025703Z","caller":"membership/cluster.go:473","msg":"removed member","cluster-id":"bc317a66ab203d10","local-member-id":"13f6375dc0bd223c","removed-remote-peer-id":"13f6375dc0bd223c","removed-remote-peer-urls":["https://192.168.111.20:2380"],"removed-remote-peer-is-learner":false}

Aug 11 21:07:34 master-0 etcd[4128]: {"level":"error","ts":"2025-08-11T21:07:34.025773Z","caller":"etcdserver/server.go:2391","msg":"Validation on configuration change failed","shouldApplyV3":true,"error":"membership: ID removed","stacktrace":"..."}

Aug 11 21:07:34 master-0 etcd[4128]: panic: removed all voters

clobrano avatar Nov 24 '25 09:11 clobrano

Aug 11 21:07:34 master-0 etcd[4128]: {"level":"info","ts":"2025-08-11T21:07:34.025703Z","caller":"membership/cluster.go:473","msg":"removed member","cluster-id":"bc317a66ab203d10","local-member-id":"13f6375dc0bd223c","removed-remote-peer-id":"13f6375dc0bd223c","removed-remote-peer-urls":["https://192.168.111.20:2380"],"removed-remote-peer-is-learner":false}

Aug 11 21:07:34 master-0 etcd[4128]: {"level":"error","ts":"2025-08-11T21:07:34.025773Z","caller":"etcdserver/server.go:2391","msg":"Validation on configuration change failed","shouldApplyV3":true,"error":"membership: ID removed","stacktrace":"..."}

Aug 11 21:07:34 master-0 etcd[4128]: panic: removed all voters

This should be expected log. When the etcd got restarted, it replayed the "remove member" log (Note that log should had already been committed previously).

See also https://github.com/etcd-io/etcd/issues/13848#issuecomment-1093449532

ahrtr avatar Nov 24 '25 09:11 ahrtr