Etcdctl error message may confuse users
Bug report criteria
- [ ] This bug report is not security related, security issues should be disclosed privately via [email protected].
- [ ] This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
- [ ] You have read the etcd bug reporting guidelines.
- [ ] Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.
What happened?
When etcdserver takes a long time to process requests, etcdctl reports an timeout error:
{"level":"warn","ts":"2024-01-29T15:39:48.719735+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-0b2c668b-8c6a-4756-b20e-10d1584680c0/172.16.238.100:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} Error: context deadline exceeded
For this error message, from the user's perspective, this request may have failed, but in reality, it is still possible for it to execute successfully.
What did you expect to happen?
Ensure consistency in logic between the server and client, or refine client-side error messages.
How can we reproduce it (as minimally and precisely as possible)?
Injecting network latency or simply inserting sleep into the source code
Anything else we need to know?
No response
Etcd version (please run commands below)
$ etcd --version
# paste output here
etcd Version: 3.4.27
Git SHA: c92fb80f3
Go Version: go1.19.10
Go OS/Arch: linux/amd64
$ etcdctl version
# paste output here
etcdctl version: 3.4.27
API version: 3.4
Etcd configuration (command line flags or environment variables)
paste your configuration here
version: "3.6" services:
node1: image: etcd:debug1 volumes: - node1-data:/etcd-data expose: - 2379 - 2380 networks: debug_etcd: ipv4_address: 172.16.238.100 environment: - ETCDCTL_API=3 command: - /usr/local/bin/etcd - --data-dir=/etcd-data - --name - node1 - --initial-advertise-peer-urls - http://172.16.238.100:2380 - --listen-peer-urls - http://0.0.0.0:2380 - --advertise-client-urls - http://172.16.238.100:2379 - --listen-client-urls - http://0.0.0.0:2379 - --initial-cluster - node1=http://172.16.238.100:2380,node2=http://172.16.238.101:2380,node3=http://172.16.238.102:2380 - --initial-cluster-state - new - --initial-cluster-token - docker-etcd
node2: image: etcd:debug1 volumes: - node2-data:/etcd-data networks: debug_etcd: ipv4_address: 172.16.238.101 environment: - ETCDCTL_API=3 expose: - 2379 - 2380 command: - /usr/local/bin/etcd - --data-dir=/etcd-data - --name - node2 - --initial-advertise-peer-urls - http://172.16.238.101:2380 - --listen-peer-urls - http://0.0.0.0:2380 - --advertise-client-urls - http://172.16.238.101:2379 - --listen-client-urls - http://0.0.0.0:2379 - --initial-cluster - node1=http://172.16.238.100:2380,node2=http://172.16.238.101:2380,node3=http://172.16.238.102:2380 - --initial-cluster-state - new - --initial-cluster-token - docker-etcd
node3: image: etcd:debug1 volumes: - node3-data:/etcd-data networks: debug_etcd: ipv4_address: 172.16.238.102 environment: - ETCDCTL_API=3 expose: - 2379 - 2380 command: - /usr/local/bin/etcd - --data-dir=/etcd-data - --name - node3 - --initial-advertise-peer-urls - http://172.16.238.102:2380 - --listen-peer-urls - http://0.0.0.0:2380 - --advertise-client-urls - http://172.16.238.102:2379 - --listen-client-urls - http://0.0.0.0:2379 - --initial-cluster - node1=http://172.16.238.100:2380,node2=http://172.16.238.101:2380,node3=http://172.16.238.102:2380 - --initial-cluster-state - new - --initial-cluster-token - docker-etcd
volumes: node1-data: node2-data: node3-data:
networks: debug_etcd: driver: bridge ipam: driver: default config: - subnet: 172.16.238.0/24
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
$ etcdctl member list -w table
# paste output here
+------------------+---------+-------+----------------------------+----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-------+----------------------------+----------------------------+------------+
| daf3fd52e3583ff | started | node3 | http://172.16.238.102:2380 | http://172.16.238.102:2379 | false |
| 422a74f03b622fef | started | node1 | http://172.16.238.100:2380 | http://172.16.238.100:2379 | false |
| ed635d2a2dbef43d | started | node2 | http://172.16.238.101:2380 | http://172.16.238.101:2379 | false |
+------------------+---------+-------+----------------------------+----------------------------+------------+
$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
+----------------------------+------------------+---------------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://172.16.238.100:2379 | 422a74f03b622fef | 3.6.0-alpha.0 | 20 kB | true | false | 2 | 8 | 8 | |
| http://172.16.238.101:2379 | ed635d2a2dbef43d | 3.6.0-alpha.0 | 20 kB | false | false | 2 | 8 | 8 | |
| http://172.16.238.102:2379 | daf3fd52e3583ff | 3.6.0-alpha.0 | 20 kB | false | false | 2 | 8 | 8 | |
+----------------------------+------------------+---------------+---------+-----------+------------+-----------+------------+--------------------+--------+
Relevant log output
No response
@huanghj78 what value are you using for context timeout? And can you increase the context timeout to avoid context deadline errors?
@rittikdasgupta Thank you for your reply! I use the default context.
And can you increase the context timeout to avoid context deadline errors?
My concern is that there will inevitably be occurrences of timeout errors at certain times. In such situations, the inconsistency between the server and client handling logic may lead to user confusion.
@huanghj78您正在使用什么值进行上下文超时?你能增加上下文超时以避免上下文截止日期错误吗?
The data displayed in the 'target' field is inconsistent with what actually occurs.
@huanghj78 I have seen these logs many times before but never bothered to investigate. Do you have any clues on when this occurs/how to reproduce this?
@huanghj78 I have seen these logs many times before but never bothered to investigate. Do you have any clues on when this occurs/how to reproduce this?
Try debugging a test case such as Replacing an etcd member and you'll come across this error. However, if you run the test case directly, you won't be encountering this error since time to complete the operation didn't exceed the time allotted to complete the operation.
@nitishfy is there any AI left for this issue?