etcd
etcd copied to clipboard
Flaky integration test case `TestV3AuthWithLeaseRevokeWithRootJWT`
Which github workflows are flaking?
test (linux-amd64-integration-1-cpu)
Which tests are flaking?
TestV3AuthWithLeaseRevokeWithRootJWT
Github Action link
https://github.com/etcd-io/etcd/actions/runs/5976797722/job/16215421678?pr=16472
Reason for failure (if possible)
v3_auth_test.go:190: lease 61048a2d0509a20b should have been revoked, got cluster_id:3559229121123123792 member_id:564804756584587524 revision:2 raft_term:2
Anything else we need to know?
No response
I think the reason for the failure is that the JWT has expired when server sending LeaseRevoke request.
https://github.com/etcd-io/etcd/blob/27633923e45bca9e6699c9d68df5790840370728/server/etcdserver/server.go#L875-L886
The log shows:
logger.go:130: 2023-08-25T14:06:59.695Z DEBUG client retrying of unary invoker {"target": "etcd-endpoints://0xc000302780/localhost:m0", "method": "/etcdserverpb.Lease/LeaseGrant", "attempt": 0}
logger.go:130: 2023-08-25T14:06:59.702Z DEBUG client retrying of unary invoker {"target": "etcd-endpoints://0xc000302780/localhost:m0", "method": "/etcdserverpb.KV/Put", "attempt": 0}
logger.go:130: 2023-08-25T14:07:02.013Z WARN m0 failed to parse a JWT token {"member": "m0", "error": "Token is expired"}
logger.go:130: 2023-08-25T14:07:02.013Z WARN m0 invalid auth token {"member": "m0", "token": "***"}
logger.go:130: 2023-08-25T14:07:02.013Z WARN m0 failed to revoke lease {"member": "m0", "lease-id": "61048a2d0509a20b", "error": "auth: invalid auth token"}
Since the lease ttl is 2, It would expired at 2023-08-25T14:07:01.695Z. And I believe the server called s.authStore.WithRoot before 2023-08-25T14:07:02.000Z, which will result in the expiration of JWT being 2023-08-25T14:07:02.000Z. JWT ttl is 1s and the precision of its expiration is in seconds.
https://github.com/etcd-io/etcd/blob/27633923e45bca9e6699c9d68df5790840370728/server/auth/jwt.go#L101-L106
I have run this test on my devbox and usually it only takes less than 10ms to send LeaseRevoke request. Perhaps the CPU throttling caused this method to take longer, and it just happened to jump to the next second when checking for expiration. Maybe we can change JWT ttl to 2s, so that it won't expire immediately.
@YaoC would you please to send patch to fix this? Thanks!
@YaoC would you please to send patch to fix this? Thanks!
Sure, I'll fix it.
same happened on the 3.5 branch PR here: https://github.com/etcd-io/etcd/actions/runs/8895381443/job/24425596792?pr=17914
2024-04-30T13:18:16.0246105Z logger.go:130: 2024-04-30T13:14:37.006Z WARN m0 failed to parse a JWT token {"member": "m0", "token": "***", "error": "Token is expired"}
2024-04-30T13:18:16.0255248Z logger.go:130: 2024-04-30T13:14:37.006Z WARN m0 invalid auth token {"member": "m0", "token": "***"}
2024-04-30T13:18:16.0258064Z logger.go:130: 2024-04-30T13:14:37.006Z WARN m0 failed to revoke lease {"member": "m0", "lease-id": "4ebb8f2f24887a0b", "error": "auth: invalid auth token"}
2024-04-30T13:18:16.0266679Z logger.go:130: 2024-04-30T13:14:37.613Z WARN m0 failed to parse a JWT token {"member": "m0", "token": "***", "error": "Token is expired"}
2024-04-30T13:18:16.0274791Z logger.go:130: 2024-04-30T13:14:37.613Z WARN m0 invalid auth token {"member": "m0", "token": "***"}
@YaoC do you want to backport your PR from #16523?