cloud-provider-openstack
cloud-provider-openstack copied to clipboard
Inhibit Configuration Caching
What this PR does / why we need it:
While caching is good from a performance perspective, I believe in this instance the benefits are miniscule, and it also doesn't handle credential rotation, which any security conscious organisation would tell you is a good thing, so exfiltration doesn't have a lasting impact.
Which issue this PR fixes(if applicable): fixes #2284
Special notes for reviewers:
Release note:
Prior to this release cloud configs were cached to improve performance, however this had the side-effect of ignoring credential rotation.
The new `--cache-cloud-config` flag allows this behavior to be tailored to your environment.
The default is `true` to maintain the current behavior for backward compatibility.
You can inhibit caching with the `--cache-cloud-config=false` CLI flag.
Welcome @spjmurray!
It looks like this is your first PR to kubernetes/cloud-provider-openstack 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes/cloud-provider-openstack has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @spjmurray. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Issue is open at https://github.com/kubernetes/cloud-provider-openstack/issues/2284 if you want to discuss this in more length
/ok-to-test
/lgtm
Incidentally the primary cost of this is going to be in additional hits on keystone, although I think it's fair to let users make this determination themselves. The way we handle this in OpenShift is with an annotation on the deployed pod based on a hash of the secret contents. This just means we create a new pod if the secret changes.
are you intend to make this cinder only or the whole CPO (including manila, OCCM etc) ? or we can merge this then do other update?
From my point of view this feature has to be supported on a lower level in pkg/client/client.go. I need some time to take a deeper look into this issue.
From my point of view this feature has to be supported on a lower level in
pkg/client/client.go. I need some time to take a deeper look into this issue.
What are you thinking here? I don't think we'd ever want to update an existing client with new credentials: you'd always create a new one with the new credentials. I'm not convinced our client wrapper needs to support this directly.
What are you thinking here?
I'm thinking about credentials rotation only when old credentials are failing. Do you have objections?
What are you thinking here?
I'm thinking about credentials rotation only when old credentials are failing. Do you have objections?
That behaviour would be surprising to me, as well as complex to implement. I'm interested to know if this would be surprising to other users, or if these are just my expectations. I'm thinking that a credential rotation workflow is going to go something like:
- Create new credentials
- Update credentials in cluster
- Ensure that everything is working correctly. Rollback if not.
- Delete old credentials
I think a user expects that when they update cluster credentials they will be used immediately, which means that they have validated the new credentials in step 3. If we keep using the old credentials until they fail, the user has not validated the credentials in step 3. If there's a problem they might be surprised in step 4 when everything fails and there is no longer a rollback option.
To keep things simple to implement, test, and debug, I think it's simpler that if we get new credentials we use them immediately and start failing immediately if they don't work.
Oh wow, I missed all this chatter, was commenting in the issue. I did mention just now that we could do something a bit more clever by polling the file on disk, and comparing with the version that was used to initialize the client. If that changes then we can perform an atomic update of the client. That'll behave the same, but alleviate the load on Keystone. Critiques?
I've spent a while to optimize auth logic from both the code and API performance perspective. Before each CPO component had its own auth logic and code duplication was insane. This PR doesn't look good because:
- In most cases the OCCM and other repo components share the same credentials. Implementing a single change only in the Cinder CSI will cause inconsistencies between CPO components.
- The
GetOpenStackProviderfunc is called only once incmd/cinder-csi-plugin/main.goand it's not looped, so thecacheCloudConfigglobal var doesn't make any sense. - Credentials rotation must be done in a different way and the logic must consider all CPO components.
I'd close this PR and move the discussion on how to achieve a proper creds rotation into an issue.
/hold
New changes are detected. LGTM label has been removed.
Yeah @mdbooth as mentioned, it's just a PoC. I can see there are definitely efforts to coalesce the OS client code into a canonical blob, so was hoping the problem would go away eventually. If I get some cycles free I may have a look to see if the CSI code can share what's been done with manilla et al. CSI seems to be in its own little world at present.
Okay @mdbooth down the rabbit hole we go... 😄 Here's what I came up with in an attempt to modulatize the hot reload stuff. There's 2 levels, the easy one, then the hard one. Hit a bit of a roadblock, as you can see in the comments, with the ingress controller and the main thing driven by k8s.io/cloud-provider. But you can get the gist.
I'm really hoping @kayrus can provide some guidance here.
@mdbooth I'll check the PR later. Thanks for patience.
I believe ostensibly we want to conceptualize this as how the typical controller-runtime app works.
- The filewatcher stuff acts as an API watch call
- You can either
- Read the config each time from the cache and create a new provider/client, but as we've discussed that's pretty crap because you need to do a token exchange, authenticate/get the catalog dance with Keystone. This does have the benefit that anything relying on the config will get updates.
- Cache clients as we do now to prevent the authentication dance, but use a "shared informer" aka the subscription stuff to do whatever updates we can.
I apologies for the delay. I'll try to take a look at this PR this week. Thanks for the patience.
No rush! I did have a thought though about k8s.io/cloud-provider. Is that just legacy stuff from when it lived in tree, and thus could be removed?
k8s.io/cloud-provider is a common package for all other cloud providers, primarily golang interfaces.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign kayrus for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Okay @kayrus I think we're starting to zero in on something good... let me know what you think!
Of note, I've not committed it yet, but I found a solution to getting the OCCM to auto-rotate credentials. Seems we register a cloud provider, then immediately call InitCloudProvider with a *File... from there, in the callback, we can type assert from io.Reader back to a file and extract the Name() and setup the polling file watcher. But that's a pretty huge change as it affects a lot of the testing jazz, so would probably be better as follow up. Like this a lot:
commit 2346c93af6e44f5de911861a527a05a08c25961c (HEAD -> occm_cloudconfig_port)
Author: Simon Murray <[email protected]>
Date: Tue Aug 22 15:00:18 2023 +0100
Port OCCM to use CloudConfigFactory
pkg/openstack/openstack.go | 152 ++++++++++++++++-----------------
pkg/openstack/openstack_test.go | 295 ++++++++++++++++-------------------------------------------------
pkg/openstack/routes_test.go | 19 +++--
3 files changed, 163 insertions(+), 303 deletions(-)
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle rotten - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Reopen this PR with
/reopen - Mark this PR as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closed this PR.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closedYou can:
- Reopen this PR with
/reopen- Mark this PR as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.