etcd
etcd copied to clipboard
server/auth: invalidate range permission cache during recovering from…
… snapshot
Fix https://github.com/etcd-io/etcd/issues/13883
The above issue reports a problem that authStore.Recover() doesn't invalidate rangePermCache, so an etcd node which is isolated from its cluster might not invalidate stale permission cache after resolving the network partitioning. This PR fixes the issue by invalidating the cache in a defensive manner.
cc @ptabor
Codecov Report
Merging #13920 (2e034d2) into main (0c9a4e0) will decrease coverage by
0.80%. The diff coverage is84.00%.
:exclamation: Current head 2e034d2 differs from pull request most recent head 241d211. Consider uploading reports for the commit 241d211 to get more accurate results
@@ Coverage Diff @@
## main #13920 +/- ##
==========================================
- Coverage 72.71% 71.91% -0.81%
==========================================
Files 469 469
Lines 38398 38414 +16
==========================================
- Hits 27923 27624 -299
- Misses 8710 9016 +306
- Partials 1765 1774 +9
| Flag | Coverage Δ | |
|---|---|---|
| all | 71.91% <84.00%> (-0.81%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| api/v3rpc/rpctypes/error.go | 90.47% <ø> (ø) |
|
| server/etcdserver/api/v3rpc/util.go | 74.19% <ø> (ø) |
|
| server/etcdserver/errors.go | 0.00% <ø> (ø) |
|
| server/etcdserver/v3_server.go | 78.17% <80.00%> (-0.21%) |
:arrow_down: |
| client/v3/mirror/syncer.go | 76.19% <100.00%> (+1.83%) |
:arrow_up: |
| server/etcdserver/server.go | 84.38% <100.00%> (-0.49%) |
:arrow_down: |
| server/proxy/httpproxy/reverse.go | 0.00% <0.00%> (-63.03%) |
:arrow_down: |
| server/proxy/httpproxy/metrics.go | 38.46% <0.00%> (-61.54%) |
:arrow_down: |
| client/v3/snapshot/v3_snapshot.go | 0.00% <0.00%> (-54.35%) |
:arrow_down: |
| server/proxy/httpproxy/proxy.go | 25.58% <0.00%> (-46.52%) |
:arrow_down: |
| ... and 40 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 0c9a4e0...241d211. Read the comment docs.
Is this reproducible on isolated members with serialiable requests? Could we maybe add an e2e tests?
Yeah, let me add e2e test cases.
https://github.com/etcd-io/etcd/pull/13954 was merged so I'll resume this PR.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
@mitake we should have already resolved the issue. could you double confirm this?
FYI. https://github.com/etcd-io/etcd/pull/14227 https://github.com/etcd-io/etcd/pull/14358 https://github.com/etcd-io/etcd/pull/14574
@ahrtr Yes, I think we can close this PR. The most reliable way to cause this issue was membership change as shown in https://github.com/etcd-io/etcd/issues/14571
I think prior to https://github.com/etcd-io/etcd/pull/13954 I think the stale rangePermCache could cause real issues rarely because rangePermCache can be stale only if this sequence can happen: 1. network partition, 2. auth config update, 3. WAL compaction, 4. rejoin.
Note that network partition caused by restarting etcd process makes rangePermCache empty and it contributes to avoiding the inconsistency issue in the above sequence until https://github.com/etcd-io/etcd/issues/14571. This is because before the PR, rangePermCache just needs to be updated during data access. So having empty rangePermCache after restarting isn't harmful.
Thanks @mitake