etcd
etcd copied to clipboard
client: support Root CA rotation on server side
- In TlsConfig creation, use
GetConfigForClient
instead ofGetCertificate
, so that we can load the latest CA. - Add a go routine to refresh tls config from given cert/key paths, such reload loop can be enabled through flags.
- Change TlsInfo to pointer receiver to avoid unnecessary copy
- Fix all TlsInfo pass by values because sync.Once cannot be copied. Provide TlsInfo.Clone to copy TlsInfo
- Unit tests and integration tests
Fixes #11555
Not sure about if we still need e2e tests. In the integration test, server start and client conn with new certs is already tested.
Can you please cleanup commits? There are over 22 commits (most of them called "fix ..."), could you squash them and maybe rename them so they could be used for review?
@serathius ping on this. Happy NY
ping again.
I think we need this feature anyway as root ca rotation is one of the fundamental operations.
cc @serathius bcz you helped review this PR before. Please review it or help me find the person to review. Thanks
Sorry for delayed response, It's in my review queue. Can you fix the conflicts and rebase the change?
change to Draft until I fix the broken tests
Codecov Report
Merging #13307 (8aeb985) into main (3dce380) will increase coverage by
0.48%
. The diff coverage is76.00%
.
@@ Coverage Diff @@
## main #13307 +/- ##
==========================================
+ Coverage 72.58% 73.06% +0.48%
==========================================
Files 469 469
Lines 38389 39178 +789
==========================================
+ Hits 27864 28627 +763
- Misses 8754 8757 +3
- Partials 1771 1794 +23
Flag | Coverage Δ | |
---|---|---|
all | 73.06% <76.00%> (+0.48%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
etcdctl/ctlv2/command/util.go | 3.93% <0.00%> (ø) |
|
server/config/config.go | 79.76% <ø> (ø) |
|
server/etcdmain/util.go | 6.00% <0.00%> (ø) |
|
server/etcdserver/api/rafthttp/transport.go | 85.71% <ø> (ø) |
|
server/etcdserver/api/v2discovery/discovery.go | 63.31% <0.00%> (ø) |
|
server/embed/etcd.go | 74.95% <40.00%> (ø) |
|
pkg/proxy/server.go | 61.08% <100.00%> (+0.06%) |
:arrow_up: |
server/embed/config.go | 73.71% <100.00%> (+0.13%) |
:arrow_up: |
server/embed/serve.go | 63.01% <100.00%> (ø) |
|
server/etcdmain/config.go | 85.36% <100.00%> (+0.10%) |
:arrow_up: |
... and 29 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 3dce380...8aeb985. Read the comment docs.
Finally make tests passed. I think the one failed is unrelated to my change
Hey @yishuT, Great work on the server rotation and test. Would love to have this change in v3.6, however I might not have enough time to properly review this PR. I will be busy addressing the recent data inconsistency https://github.com/etcd-io/etcd/blob/main/Documentation/postmortems/v3.5-data-inconsistency.md, so it might take me couple of weeks before I get back to reviewing this one. Etcd needs to prioritize stability over new features now.
Sorry that I'm unable to this is PR attention it deserves.
Hey, that's totally fine. Thanks for the review.
@serathius qq, to test root CA rotation, I need to create clients with tls config signed by different root CAs. The current Cluster
interface in test framework does not allow me to do that. Therefore, I might need to make change to the interface. Do you prefer it to be in this PR which already changes many pieces or in a separate PR?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
Not stale
cc @spzala @ahrtr to help review
Thanks @yishuT for the PR and sorry for the late response.
I will take a look after you resolve the conflict.
thanks. Will try to get to this PR by end of next week
@ahrtr please review
Sorry for the late response, I am working on https://github.com/etcd-io/etcd/issues/14370 right now. This PR will be a priority when I finish that one.
I do not get time to review this PR so far, but 1K+ lines of code change seems too big to me. Please consider to break the PR if possible. Will have a closer look later.
Sorry for the late response, I am working on #14370 right now. This PR will be a priority when I finish that one.
I do not get time to review this PR so far, but 1K+ lines of code change seems too big to me. Please consider to break the PR if possible. Will have a closer look later.
this PR touches the TlsInfo, so I need to make change everywhere...
@ahrtr I have a proposal to break this PR into
- TlsInfo -> *TlsInfo so that it's not copied everywhere
- Introduce the root CA rotation logic
sg?
@ahrtr hey, I would like to finish this PR. As discussed before, this PR is too big to review. I am going to split this pr into couple parts.
- Make TlsInfo -> *TlsInfo. The reason is that to reload root ca, we need to have a background goroutine.
- do the real root ca rotation.
wdyt?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
Hi,
I saw this PR has been opened for two years without significant progress. I wonder if anything I can help to move forward this PR.
From my side, I am from Azure Kubernetes Service team. We do have requirement to rotate etcd CA in zero-downtime. This PR seems to be able to solve the problem. If the community needs additional help to merge this PR, please don't hesitate to ping me.
I can resume the work if the community can review the PR. I didn't receive any response earlier if they like my proposal. maybe @ahrtr ?
We can split the PR into two as @ahrtr mentioned. @yishuT you need help with that?
We can split the PR into two as @ahrtr mentioned. @yishuT you need help with that?
I don't have time to work on it recently. Feel free to take it over
Fixes #11555
Thanks. After an analysis on this PR, I think we can split this PR in this way:
- Provide implementation for "GetConfigForClient". This will allow server CA rotation to work in term of functionality.
- Optimize the loading of CA files. In particular, avoid loading the CA files repeatedly on each request.
I am working on the first part: https://github.com/etcd-io/etcd/pull/16500
@yishuT
Why did this issue get closed? The problem still exists and all of the PRs that were created are still outstanding waiting for maintainer review.
@yishuT
Why did this issue get closed? The problem still exists and all of the PRs that were created are still outstanding waiting for maintainer review.
This pr is pending review for yrs. also I don't have time to work on it recently while there is another pr trying to solve the same problem so I closed this pr
My bad, I thought this was the primary issue tracking the problem not a PR. I should have paid more attention.
Thanks for trying to make this better for all of us. I wish that it had been accepted back when you had opened it.