grpcproxy: use metadata instead of context withvalue in with client auth token
Change to use metadata instead of context.WithValue to ensure each proxy watcher client has a new stream created with its token.
Previously context.WithValue resulted in streamKeyFromCtx returning an empty string in the clientv3 watcher, causing stream reuse.
When new clients connected to proxy after the token expired (token for the initial client which connected) the reused stream's context would still contain the expired token. This caused auth failures when isWatchPermitted on cluster checked the stream's context resulting in hanging proxy watcher clients.
Issue can be reproduced by setting a low --auth-token-ttl on cluster and connect 1 client first to proxy and then connect a second one after token expired.
Hi @krijohs. Thanks for your PR.
I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
Added test case which reproduces the issue, with included change it passes but without fails
Have anyone had the chance to have a look at this? pinging from reveiwers in owners file, @fuweid @ivanvc
Hi @krijohs, thanks for your pull request. Ideally, we would want to discuss the issue and possible solutions before a pull request. Could you please open an issue so other members with more expertise in this area can jump in?
Thanks again.
Hello @ivanvc ok, got it will open an issue so possible solutions can be discussed, thanks.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
/reopen
/ok-to-test
Codecov Report
:x: Patch coverage is 40.00000% with 3 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 69.33%. Comparing base (431a65a) to head (1f5402b).
:warning: Report is 461 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| server/proxy/grpcproxy/util.go | 40.00% | 3 Missing :warning: |
Additional details and impacted files
| Files with missing lines | Coverage Δ | |
|---|---|---|
| server/proxy/grpcproxy/util.go | 41.93% <40.00%> (+21.93%) |
:arrow_up: |
... and 58 files with indirect coverage changes
@@ Coverage Diff @@
## main #19033 +/- ##
==========================================
+ Coverage 69.21% 69.33% +0.12%
==========================================
Files 419 422 +3
Lines 34745 34842 +97
==========================================
+ Hits 24049 24158 +109
+ Misses 9300 9292 -8
+ Partials 1396 1392 -4
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 431a65a...1f5402b. Read the comment docs.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
/retest
@krijohs, could you please rebase your branch with the latest upstream main branch? I'm having issues running the tests because the base is outdated. Thanks :)
@ivanvc sure no problem, just rebased and pushed
Hi @ivanvc just checking if anyone have had time to look at this PR?
Pls squash the commit, thx
Pls squash the commit, thx
sure, just squashed and pushed
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: ahrtr, ivanvc, krijohs
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [ahrtr,ivanvc]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@mitake would you be able to take a look at this PR as well? Thanks. It's related to your PR https://github.com/etcd-io/etcd/pull/8289.
@krijohs can you update the e2e test to verify the stream request (e.g. watch) via grpcproxy also work when auth is enabled?
@krijohs can you update the e2e test to verify the stream request (e.g. watch) via grpcproxy also work when auth is enabled?
If im understanding you correctly.
The added e2e test TestGRPCProxyWatchersAfterTokenExpiry in this PR already uses authenticated watch streams through the grpc-proxy, all three watches use WithAuth("root", "rootPassword") and WithEndpoints(proxyClientURL) while auth is enabled on the server.
It verifies the proxy forwards auth for streaming requests, if you like i can add a comment to make it more obvious.
The added e2e test
TestGRPCProxyWatchersAfterTokenExpiryin this PR already uses authenticated watch streams through
why the test did not see any issue before you resolve https://github.com/etcd-io/etcd/pull/19033#discussion_r2541813945?
The added e2e test
TestGRPCProxyWatchersAfterTokenExpiryin this PR already uses authenticated watch streams throughwhy the test did not see any issue before you resolve #19033 (comment)?
When grpc proxy sets up a new watch broadcast it calls withClientAuthToken in newWatchBroadcast so the AuthStreamClientInterceptor is not used for watchers from what i can see.
I can try and create e2e test that verifies the latest changes made to AuthStreamClientInterceptor
I think if the token expires, then the test should fail (before you resolved my comment). It should work now as you have already resolved it. Can you double check this using a test case or manually?
Refer to https://github.com/etcd-io/etcd/issues/11954