etcd icon indicating copy to clipboard operation
etcd copied to clipboard

Metric etcd_server_is_learner doesn't return the expected value after snapshot trigger

Open YaoC opened this issue 2 years ago • 3 comments

Bug report criteria

What happened?

Currently, etcd sets the value of etcd_server_is_learner when conf change (ConfChangeAddNode or ConfChangeAddLearnerNode)

https://github.com/etcd-io/etcd/blob/dc26e816fdb07eeb5fff6586051a21e3afc3005f/server/etcdserver/server.go#L2062C1-L2069C4

After snapshot triggered, the log entries were compacted, when the learner joins or restarts at this time, the metric etcd_server_is_learner will be0, while the expected value is 1.

What did you expect to happen?

The etcd_server_is_learner of learner should always be 1.

How can we reproduce it (as minimally and precisely as possible)?

Procfile:

etcd1: bin/etcd --name infra1 --snapshot-count=1000 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
etcd2: bin/etcd --name infra2 --snapshot-count=1000 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
etcd3: bin/etcd --name infra3 --snapshot-count=1000 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr

Here I set --snapshot-count=1000 to trigger the snapshot earlier.

There are two situations:

case 1

The learner joins the cluster after snapshot triggered

goreman start

etcdctl member add infra4 --peer-urls="http://127.0.0.1:42380" --learner=true

# trigger snapshot
for _ in {1..1000}
do
    etcdctl put foo bar
done

bin/etcd --name infra4 --snapshot-count=1000 --listen-client-urls http://127.0.0.1:42379 --advertise-client-urls http://127.0.0.1:42379 --listen-peer-urls http://127.0.0.1:42380 --initial-advertise-peer-urls http://127.0.0.1:42380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra4=http://127.0.0.1:42380,infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state existing --enable-pprof --logger=zap --log-outputs=stderr

etcdctl member list  -w table
+------------------+---------+--------+------------------------+------------------------+------------+
|        ID        | STATUS  |  NAME  |       PEER ADDRS       |      CLIENT ADDRS      | IS LEARNER |
+------------------+---------+--------+------------------------+------------------------+------------+
| 27ac39b4cdff305e | started | infra4 | http://127.0.0.1:42380 | http://127.0.0.1:42379 |       true |
| 8211f1d0f64f3269 | started | infra1 | http://127.0.0.1:12380 |  http://127.0.0.1:2379 |      false |
| 91bc3c398fb3c146 | started | infra2 | http://127.0.0.1:22380 | http://127.0.0.1:22379 |      false |
| fd422379fda50e48 | started | infra3 | http://127.0.0.1:32380 | http://127.0.0.1:32379 |      false |
+------------------+---------+--------+------------------------+------------------------+------------+

curl -s  http://127.0.0.1:42379/metrics | grep etcd_server_is_learner
# HELP etcd_server_is_learner Whether or not this member is a learner. 1 if is, 0 otherwise.
# TYPE etcd_server_is_learner gauge
etcd_server_is_learner 0

case 2

The learner restarts after snapshot triggered

rm -rf infra*

goreman start

etcdctl member add infra4 --peer-urls="http://127.0.0.1:42380" --learner=true

bin/etcd --name infra4 --snapshot-count=1000 --listen-client-urls http://127.0.0.1:42379 --advertise-client-urls http://127.0.0.1:42379 --listen-peer-urls http://127.0.0.1:42380 --initial-advertise-peer-urls http://127.0.0.1:42380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra4=http://127.0.0.1:42380,infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state existing --enable-pprof --logger=zap --log-outputs=stderr

# trigger snapshot
for _ in {1..1000}
do
    etcdctl put foo bar
done

etcdctl member list  -w table
+------------------+---------+--------+------------------------+------------------------+------------+
|        ID        | STATUS  |  NAME  |       PEER ADDRS       |      CLIENT ADDRS      | IS LEARNER |
+------------------+---------+--------+------------------------+------------------------+------------+
| 69bf92e920251adb | started | infra4 | http://127.0.0.1:42380 | http://127.0.0.1:42379 |       true |
| 8211f1d0f64f3269 | started | infra1 | http://127.0.0.1:12380 |  http://127.0.0.1:2379 |      false |
| 91bc3c398fb3c146 | started | infra2 | http://127.0.0.1:22380 | http://127.0.0.1:22379 |      false |
| fd422379fda50e48 | started | infra3 | http://127.0.0.1:32380 | http://127.0.0.1:32379 |      false |
+------------------+---------+--------+------------------------+------------------------+------------+

curl -s  http://127.0.0.1:42379/metrics | grep etcd_server_is_learner
# HELP etcd_server_is_learner Whether or not this member is a learner. 1 if is, 0 otherwise.
# TYPE etcd_server_is_learner gauge
etcd_server_is_learner 1

# kill the learner and restart
bin/etcd --name infra4 --snapshot-count=1000 --listen-client-urls http://127.0.0.1:42379 --advertise-client-urls http://127.0.0.1:42379 --listen-peer-urls http://127.0.0.1:42380 --initial-advertise-peer-urls http://127.0.0.1:42380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra4=http://127.0.0.1:42380,infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state existing --enable-pprof --logger=zap --log-outputs=stderr

curl -s  http://127.0.0.1:42379/metrics | grep etcd_server_is_learner
# HELP etcd_server_is_learner Whether or not this member is a learner. 1 if is, 0 otherwise.
# TYPE etcd_server_is_learner gauge
etcd_server_is_learner 0

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
etcd Version: 3.6.0-alpha.0
Git SHA: 93530f6e0
Go Version: go1.21.3
Go OS/Arch: linux/amd64

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

YaoC avatar Dec 29 '23 06:12 YaoC

Open to track the backport effort.

ahrtr avatar Aug 03 '25 08:08 ahrtr

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Oct 03 '25 00:10 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Dec 07 '25 00:12 github-actions[bot]