EtcdError is missing GRPCStatus that causes grpc.status.FromError to always return codes.Unknown
What happened?
I've discovered this issue when debugging watch not retrying in openWatchClient
isHaltErr was returning true on ErrLeaderChanged
This is only applicable when auth is enabled.
w.remote.Watch(w.ctx, w.callOpts...) -> streamClientInterceptor-> getToken -> Auth.Authenticate -> toErr
toErr will convert ErrGRPCLeaderChanged to ErrLeaderChanged, isHaltErr(ErrLeaderChanged) returns true
When auth is disabled, Auth.Authenticate isn't called and toErr conversion doesn't happen.
What did you expect to happen?
expected watch to retry
How can we reproduce it (as minimally and precisely as possible)?
See unit and integration test in the PR.
Note, test is using ErrGRPCNoLeader, but in production we've observed ErrLeaderChanged.
Anything else we need to know?
Another side effect of this issue is that prometheus interceptor will only increment Unknown grpc code in metrics on error.
I suspect this might also affect behavior of retry_interceptor and client/v3/leasing that retries on transient error.
Etcd version (please run commands below)
etcd 3.4, 3.5 and main
$ etcd --version
etcd Version: 3.6.0-alpha.0
Git SHA: f43dccb2f
Go Version: go1.19.2
Go OS/Arch: darwin/amd64
$ etcdctl version
etcdctl version: 3.6.0-alpha.0
API version: 3.6
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
$ etcdctl member list -w table
# paste output here
$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
Relevant log output
No response
this is related to auth implementation cc @mitake @ahrtr
cc @ptabor
Thanks @lavacat , probably I'll be able to check sometime Wednesday, sorry for keeping you waiting.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.