etcd icon indicating copy to clipboard operation
etcd copied to clipboard

Flaky integration test case `TestV3AuthWithLeaseRevokeWithRootJWT`

Open ahrtr opened this issue 2 years ago • 4 comments

Which github workflows are flaking?

test (linux-amd64-integration-1-cpu)

Which tests are flaking?

TestV3AuthWithLeaseRevokeWithRootJWT

Github Action link

https://github.com/etcd-io/etcd/actions/runs/5976797722/job/16215421678?pr=16472

Reason for failure (if possible)

v3_auth_test.go:190: lease 61048a2d0509a20b should have been revoked, got cluster_id:3559229121123123792 member_id:564804756584587524 revision:2 raft_term:2 

Anything else we need to know?

No response

ahrtr avatar Aug 25 '23 14:08 ahrtr

I think the reason for the failure is that the JWT has expired when server sending LeaseRevoke request.

https://github.com/etcd-io/etcd/blob/27633923e45bca9e6699c9d68df5790840370728/server/etcdserver/server.go#L875-L886

The log shows:

logger.go:130: 2023-08-25T14:06:59.695Z	DEBUG	client	retrying of unary invoker	{"target": "etcd-endpoints://0xc000302780/localhost:m0", "method": "/etcdserverpb.Lease/LeaseGrant", "attempt": 0}
logger.go:130: 2023-08-25T14:06:59.702Z	DEBUG	client	retrying of unary invoker	{"target": "etcd-endpoints://0xc000302780/localhost:m0", "method": "/etcdserverpb.KV/Put", "attempt": 0}
logger.go:130: 2023-08-25T14:07:02.013Z	WARN	m0	failed to parse a JWT token	{"member": "m0", "error": "Token is expired"}
logger.go:130: 2023-08-25T14:07:02.013Z	WARN	m0	invalid auth token	{"member": "m0", "token": "***"}
logger.go:130: 2023-08-25T14:07:02.013Z	WARN	m0	failed to revoke lease	{"member": "m0", "lease-id": "61048a2d0509a20b", "error": "auth: invalid auth token"}

Since the lease ttl is 2, It would expired at 2023-08-25T14:07:01.695Z. And I believe the server called s.authStore.WithRoot before 2023-08-25T14:07:02.000Z, which will result in the expiration of JWT being 2023-08-25T14:07:02.000Z. JWT ttl is 1s and the precision of its expiration is in seconds.

https://github.com/etcd-io/etcd/blob/27633923e45bca9e6699c9d68df5790840370728/server/auth/jwt.go#L101-L106

I have run this test on my devbox and usually it only takes less than 10ms to send LeaseRevoke request. Perhaps the CPU throttling caused this method to take longer, and it just happened to jump to the next second when checking for expiration. Maybe we can change JWT ttl to 2s, so that it won't expire immediately.

YaoC avatar Sep 01 '23 14:09 YaoC

@YaoC would you please to send patch to fix this? Thanks!

fuweid avatar Sep 01 '23 14:09 fuweid

@YaoC would you please to send patch to fix this? Thanks!

Sure, I'll fix it.

YaoC avatar Sep 01 '23 16:09 YaoC

same happened on the 3.5 branch PR here: https://github.com/etcd-io/etcd/actions/runs/8895381443/job/24425596792?pr=17914

2024-04-30T13:18:16.0246105Z     logger.go:130: 2024-04-30T13:14:37.006Z	WARN	m0	failed to parse a JWT token	{"member": "m0", "token": "***", "error": "Token is expired"}
2024-04-30T13:18:16.0255248Z     logger.go:130: 2024-04-30T13:14:37.006Z	WARN	m0	invalid auth token	{"member": "m0", "token": "***"}
2024-04-30T13:18:16.0258064Z     logger.go:130: 2024-04-30T13:14:37.006Z	WARN	m0	failed to revoke lease	{"member": "m0", "lease-id": "4ebb8f2f24887a0b", "error": "auth: invalid auth token"}
2024-04-30T13:18:16.0266679Z     logger.go:130: 2024-04-30T13:14:37.613Z	WARN	m0	failed to parse a JWT token	{"member": "m0", "token": "***", "error": "Token is expired"}
2024-04-30T13:18:16.0274791Z     logger.go:130: 2024-04-30T13:14:37.613Z	WARN	m0	invalid auth token	{"member": "m0", "token": "***"}

@YaoC do you want to backport your PR from #16523?

tjungblu avatar Apr 30 '24 13:04 tjungblu