etcd icon indicating copy to clipboard operation
etcd copied to clipboard

Etcdctl error message may confuse users

Open huanghj78 opened this issue 1 year ago • 6 comments

Bug report criteria

What happened?

When etcdserver takes a long time to process requests, etcdctl reports an timeout error:

{"level":"warn","ts":"2024-01-29T15:39:48.719735+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-0b2c668b-8c6a-4756-b20e-10d1584680c0/172.16.238.100:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} Error: context deadline exceeded

For this error message, from the user's perspective, this request may have failed, but in reality, it is still possible for it to execute successfully. image

What did you expect to happen?

Ensure consistency in logic between the server and client, or refine client-side error messages.

How can we reproduce it (as minimally and precisely as possible)?

Injecting network latency or simply inserting sleep into the source code

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
# paste output here
etcd Version: 3.4.27
Git SHA: c92fb80f3
Go Version: go1.19.10
Go OS/Arch: linux/amd64
$ etcdctl version
# paste output here
etcdctl version: 3.4.27
API version: 3.4

Etcd configuration (command line flags or environment variables)

paste your configuration here

version: "3.6" services:

node1: image: etcd:debug1 volumes: - node1-data:/etcd-data expose: - 2379 - 2380 networks: debug_etcd: ipv4_address: 172.16.238.100 environment: - ETCDCTL_API=3 command: - /usr/local/bin/etcd - --data-dir=/etcd-data - --name - node1 - --initial-advertise-peer-urls - http://172.16.238.100:2380 - --listen-peer-urls - http://0.0.0.0:2380 - --advertise-client-urls - http://172.16.238.100:2379 - --listen-client-urls - http://0.0.0.0:2379 - --initial-cluster - node1=http://172.16.238.100:2380,node2=http://172.16.238.101:2380,node3=http://172.16.238.102:2380 - --initial-cluster-state - new - --initial-cluster-token - docker-etcd

node2: image: etcd:debug1 volumes: - node2-data:/etcd-data networks: debug_etcd: ipv4_address: 172.16.238.101 environment: - ETCDCTL_API=3 expose: - 2379 - 2380 command: - /usr/local/bin/etcd - --data-dir=/etcd-data - --name - node2 - --initial-advertise-peer-urls - http://172.16.238.101:2380 - --listen-peer-urls - http://0.0.0.0:2380 - --advertise-client-urls - http://172.16.238.101:2379 - --listen-client-urls - http://0.0.0.0:2379 - --initial-cluster - node1=http://172.16.238.100:2380,node2=http://172.16.238.101:2380,node3=http://172.16.238.102:2380 - --initial-cluster-state - new - --initial-cluster-token - docker-etcd

node3: image: etcd:debug1 volumes: - node3-data:/etcd-data networks: debug_etcd: ipv4_address: 172.16.238.102 environment: - ETCDCTL_API=3 expose: - 2379 - 2380 command: - /usr/local/bin/etcd - --data-dir=/etcd-data - --name - node3 - --initial-advertise-peer-urls - http://172.16.238.102:2380 - --listen-peer-urls - http://0.0.0.0:2380 - --advertise-client-urls - http://172.16.238.102:2379 - --listen-client-urls - http://0.0.0.0:2379 - --initial-cluster - node1=http://172.16.238.100:2380,node2=http://172.16.238.101:2380,node3=http://172.16.238.102:2380 - --initial-cluster-state - new - --initial-cluster-token - docker-etcd

volumes: node1-data: node2-data: node3-data:

networks: debug_etcd: driver: bridge ipam: driver: default config: - subnet: 172.16.238.0/24

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here
+------------------+---------+-------+----------------------------+----------------------------+------------+
|        ID        | STATUS  | NAME  |         PEER ADDRS         |        CLIENT ADDRS        | IS LEARNER |
+------------------+---------+-------+----------------------------+----------------------------+------------+
|  daf3fd52e3583ff | started | node3 | http://172.16.238.102:2380 | http://172.16.238.102:2379 |      false |
| 422a74f03b622fef | started | node1 | http://172.16.238.100:2380 | http://172.16.238.100:2379 |      false |
| ed635d2a2dbef43d | started | node2 | http://172.16.238.101:2380 | http://172.16.238.101:2379 |      false |
+------------------+---------+-------+----------------------------+----------------------------+------------+

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
+----------------------------+------------------+---------------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT          |        ID        |    VERSION    | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://172.16.238.100:2379 | 422a74f03b622fef | 3.6.0-alpha.0 |   20 kB |      true |      false |         2 |          8 |                  8 |        |
| http://172.16.238.101:2379 | ed635d2a2dbef43d | 3.6.0-alpha.0 |   20 kB |     false |      false |         2 |          8 |                  8 |        |
| http://172.16.238.102:2379 |  daf3fd52e3583ff | 3.6.0-alpha.0 |   20 kB |     false |      false |         2 |          8 |                  8 |        |
+----------------------------+------------------+---------------+---------+-----------+------------+-----------+------------+--------------------+--------+

Relevant log output

No response

huanghj78 avatar Jan 29 '24 07:01 huanghj78

@huanghj78 what value are you using for context timeout? And can you increase the context timeout to avoid context deadline errors?

rittikdasgupta avatar Jan 29 '24 19:01 rittikdasgupta

@rittikdasgupta Thank you for your reply! I use the default context.

And can you increase the context timeout to avoid context deadline errors?

My concern is that there will inevitably be occurrences of timeout errors at certain times. In such situations, the inconsistency between the server and client handling logic may lead to user confusion.

huanghj78 avatar Jan 30 '24 03:01 huanghj78

@huanghj78您正在使用什么值进行上下文超时?你能增加上下文超时以避免上下文截止日期错误吗?

The data displayed in the 'target' field is inconsistent with what actually occurs.

xulin152 avatar Jan 30 '24 08:01 xulin152

@huanghj78 I have seen these logs many times before but never bothered to investigate. Do you have any clues on when this occurs/how to reproduce this?

shreemaan-abhishek avatar Feb 02 '24 03:02 shreemaan-abhishek

@huanghj78 I have seen these logs many times before but never bothered to investigate. Do you have any clues on when this occurs/how to reproduce this?

Try debugging a test case such as Replacing an etcd member and you'll come across this error. However, if you run the test case directly, you won't be encountering this error since time to complete the operation didn't exceed the time allotted to complete the operation.

nitishfy avatar Feb 13 '24 18:02 nitishfy

@nitishfy is there any AI left for this issue?

siyuanfoundation avatar Apr 25 '24 18:04 siyuanfoundation