etcd icon indicating copy to clipboard operation
etcd copied to clipboard

EtcdError is missing GRPCStatus that causes grpc.status.FromError to always return codes.Unknown

Open lavacat opened this issue 2 years ago • 5 comments

What happened?

I've discovered this issue when debugging watch not retrying in openWatchClient

isHaltErr was returning true on ErrLeaderChanged

This is only applicable when auth is enabled. w.remote.Watch(w.ctx, w.callOpts...) -> streamClientInterceptor-> getToken -> Auth.Authenticate -> toErr toErr will convert ErrGRPCLeaderChanged to ErrLeaderChanged, isHaltErr(ErrLeaderChanged) returns true

When auth is disabled, Auth.Authenticate isn't called and toErr conversion doesn't happen.

What did you expect to happen?

expected watch to retry

How can we reproduce it (as minimally and precisely as possible)?

See unit and integration test in the PR.

Note, test is using ErrGRPCNoLeader, but in production we've observed ErrLeaderChanged.

Anything else we need to know?

Another side effect of this issue is that prometheus interceptor will only increment Unknown grpc code in metrics on error.

I suspect this might also affect behavior of retry_interceptor and client/v3/leasing that retries on transient error.

Etcd version (please run commands below)

etcd 3.4, 3.5 and main

$ etcd --version
etcd Version: 3.6.0-alpha.0
Git SHA: f43dccb2f
Go Version: go1.19.2
Go OS/Arch: darwin/amd64

$ etcdctl version
etcdctl version: 3.6.0-alpha.0
API version: 3.6

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

lavacat avatar Mar 21 '23 08:03 lavacat

this is related to auth implementation cc @mitake @ahrtr

lavacat avatar Apr 03 '23 21:04 lavacat

cc @ptabor

serathius avatar Apr 04 '23 10:04 serathius

Thanks @lavacat , probably I'll be able to check sometime Wednesday, sorry for keeping you waiting.

mitake avatar Apr 10 '23 13:04 mitake

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 12 '23 07:08 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jun 13 '25 00:06 github-actions[bot]