Metric etcd_server_is_learner doesn't return the expected value after snapshot trigger
Bug report criteria
- [X] This bug report is not security related, security issues should be disclosed privately via [email protected].
- [X] This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
- [X] You have read the etcd bug reporting guidelines.
- [X] Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.
What happened?
Currently, etcd sets the value of etcd_server_is_learner when conf change (ConfChangeAddNode or ConfChangeAddLearnerNode)
https://github.com/etcd-io/etcd/blob/dc26e816fdb07eeb5fff6586051a21e3afc3005f/server/etcdserver/server.go#L2062C1-L2069C4
After snapshot triggered, the log entries were compacted, when the learner joins or restarts at this time, the metric etcd_server_is_learner will be0, while the expected value is 1.
What did you expect to happen?
The etcd_server_is_learner of learner should always be 1.
How can we reproduce it (as minimally and precisely as possible)?
Procfile:
etcd1: bin/etcd --name infra1 --snapshot-count=1000 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
etcd2: bin/etcd --name infra2 --snapshot-count=1000 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
etcd3: bin/etcd --name infra3 --snapshot-count=1000 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
Here I set --snapshot-count=1000 to trigger the snapshot earlier.
There are two situations:
case 1
The learner joins the cluster after snapshot triggered
goreman start
etcdctl member add infra4 --peer-urls="http://127.0.0.1:42380" --learner=true
# trigger snapshot
for _ in {1..1000}
do
etcdctl put foo bar
done
bin/etcd --name infra4 --snapshot-count=1000 --listen-client-urls http://127.0.0.1:42379 --advertise-client-urls http://127.0.0.1:42379 --listen-peer-urls http://127.0.0.1:42380 --initial-advertise-peer-urls http://127.0.0.1:42380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra4=http://127.0.0.1:42380,infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state existing --enable-pprof --logger=zap --log-outputs=stderr
etcdctl member list -w table
+------------------+---------+--------+------------------------+------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+--------+------------------------+------------------------+------------+
| 27ac39b4cdff305e | started | infra4 | http://127.0.0.1:42380 | http://127.0.0.1:42379 | true |
| 8211f1d0f64f3269 | started | infra1 | http://127.0.0.1:12380 | http://127.0.0.1:2379 | false |
| 91bc3c398fb3c146 | started | infra2 | http://127.0.0.1:22380 | http://127.0.0.1:22379 | false |
| fd422379fda50e48 | started | infra3 | http://127.0.0.1:32380 | http://127.0.0.1:32379 | false |
+------------------+---------+--------+------------------------+------------------------+------------+
curl -s http://127.0.0.1:42379/metrics | grep etcd_server_is_learner
# HELP etcd_server_is_learner Whether or not this member is a learner. 1 if is, 0 otherwise.
# TYPE etcd_server_is_learner gauge
etcd_server_is_learner 0
case 2
The learner restarts after snapshot triggered
rm -rf infra*
goreman start
etcdctl member add infra4 --peer-urls="http://127.0.0.1:42380" --learner=true
bin/etcd --name infra4 --snapshot-count=1000 --listen-client-urls http://127.0.0.1:42379 --advertise-client-urls http://127.0.0.1:42379 --listen-peer-urls http://127.0.0.1:42380 --initial-advertise-peer-urls http://127.0.0.1:42380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra4=http://127.0.0.1:42380,infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state existing --enable-pprof --logger=zap --log-outputs=stderr
# trigger snapshot
for _ in {1..1000}
do
etcdctl put foo bar
done
etcdctl member list -w table
+------------------+---------+--------+------------------------+------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+--------+------------------------+------------------------+------------+
| 69bf92e920251adb | started | infra4 | http://127.0.0.1:42380 | http://127.0.0.1:42379 | true |
| 8211f1d0f64f3269 | started | infra1 | http://127.0.0.1:12380 | http://127.0.0.1:2379 | false |
| 91bc3c398fb3c146 | started | infra2 | http://127.0.0.1:22380 | http://127.0.0.1:22379 | false |
| fd422379fda50e48 | started | infra3 | http://127.0.0.1:32380 | http://127.0.0.1:32379 | false |
+------------------+---------+--------+------------------------+------------------------+------------+
curl -s http://127.0.0.1:42379/metrics | grep etcd_server_is_learner
# HELP etcd_server_is_learner Whether or not this member is a learner. 1 if is, 0 otherwise.
# TYPE etcd_server_is_learner gauge
etcd_server_is_learner 1
# kill the learner and restart
bin/etcd --name infra4 --snapshot-count=1000 --listen-client-urls http://127.0.0.1:42379 --advertise-client-urls http://127.0.0.1:42379 --listen-peer-urls http://127.0.0.1:42380 --initial-advertise-peer-urls http://127.0.0.1:42380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra4=http://127.0.0.1:42380,infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state existing --enable-pprof --logger=zap --log-outputs=stderr
curl -s http://127.0.0.1:42379/metrics | grep etcd_server_is_learner
# HELP etcd_server_is_learner Whether or not this member is a learner. 1 if is, 0 otherwise.
# TYPE etcd_server_is_learner gauge
etcd_server_is_learner 0
Anything else we need to know?
No response
Etcd version (please run commands below)
$ etcd --version
etcd Version: 3.6.0-alpha.0
Git SHA: 93530f6e0
Go Version: go1.21.3
Go OS/Arch: linux/amd64
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
$ etcdctl member list -w table
# paste output here
$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
Relevant log output
No response
Open to track the backport effort.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.