enhancements icon indicating copy to clipboard operation
enhancements copied to clipboard

KEP-4322: Add credentials definitions

Open ryanzhang-oss opened this issue 1 year ago • 11 comments

  • One-line PR description: Add the definition of cluster credential in the clusterProfile API
  • Issue link:
  • Other comments:

ryanzhang-oss avatar Aug 01 '24 01:08 ryanzhang-oss

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: ryanzhang-oss / name: Ryan Zhang (22372a7fde3b455f6a03a912e501cf426c3d7747)

Hi @ryanzhang-oss. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Aug 01 '24 01:08 k8s-ci-robot

/assign @skitt

mikeshng avatar Aug 02 '24 18:08 mikeshng

/ok-to-test

skitt avatar Aug 05 '24 09:08 skitt

BTW once the structure is ready (which it probably is), you need to run make update-toc and include the changes that produces.

skitt avatar Aug 06 '24 16:08 skitt

During the SIG-Multicluster meeting today we discussed looking for precedence regarding cues for how a project/controller should identify shared resources it should care about or consume, to inform the proposed - consumer: "consumerName" pattern proposed here.

In Gateway API I believe an equivalent would be the GatewayClass controllerName field - it's similarly a well-known string, the primary difference being that it is required to be domain-prefixed (e.g. company.io/projectName) to avoid collisions. GatewayClass is also a separate resource that should only be watched by a single responsible controller, but it does have a good bit more functionality than what's being proposed here for each list item.

mikemorris avatar Aug 06 '24 17:08 mikemorris

Fwiw, I'm still wrestling with the credentials and permission issue.

I'm wondering how this interfaces with potential CI/CD. The worst that we could do is make this a CI/CD (e.g. registering is the enable and the clusterManagers becomes a CI/CD for multicluster-features).

In the case where a federated WI is available, generated credentials per cluster are not necessary, so ClusterProfile would ~only need to contain the Endpoint. It sounds we should make sure "Endpoint" is a first class thing.

Then there is always the question of: Push vs Pull and distributing RBAC.

If Pull:

  • whatever runs on the cluster can be a single agent Or per-feature agent and those agents are the ones needing credentials where the cluster inventory is (this design doesn't solve that)
  • I think the work api here would be the most likely setup?

If Push:

  • the controller where the cluster inventory resides would need permission in every cluster. This sounds like a task for CI/CD
  • that controller would also need credentials (a cert/token) if there is no federated identity.

For either scenario, it seems that something with "Cluster-admin" in both (or either) of the clusters would need to be creating and distributing the right permissions. It sounds that this cluster-admin thing is whatever CI/CD solution is used.

Which brings the questions: 1/ How is the CI/CD setup? Do we need to standardize that? 2/ Do we need a new credential/permission setup, effectively a parallel CI/CD track? 3/ Can this API be implemented with the popular CI/CD that are already used?

Given this would be a security-minded feature (handling credentials and permissions), we need to make sure we encourage safe-by-default patterns.

corentone avatar Sep 03 '24 17:09 corentone

I made the following commit to illustrate some of my issues and potential solutions: https://github.com/ryanzhang-oss/enhancements/compare/KEP4322-credential...corentone:enhancements:KEP4322-credential

I think adding those would get us closer to a solution that's a little bit more specific on what govern the credentials and especially the definition of consumer. (maybe we should rename it, but Im bad at naming. )

corentone avatar Oct 14 '24 21:10 corentone

Some high level thoughts.

If the goal "there a spoke cluster credentials, divided into per-component namespaces, retrievable via ClusterProfile.name and component.name", then a ClusterProfile API field isn't necessary.

  1. make a namespace per component (already called out in this design)
  2. components must already know their names and can either
    1. list all spoke cluster credentials in the namespace
    2. list using a label selector for the spoke credential they need
  3. components maintain credentials for a spoke cluster across all components can
    1. list all secrets across all namespaces with label selector (see the authorization by label selector that is now possible in kube)

The format of what is stored in that secret still needs standarization, but the linkage is from the lower level up, not the higher level down. Such a structure scales higher as there aren't centralized clusterprofiles/status rally points to conflict on.

Now separately to the point that Mo raised about holding cluster credentials on a cluster. The general direction of sig-auth in kube is to try to externalize secret content. From a practical perspective, what this means for extensions is that flexibility in how per-cluster credentials are delivered to extension points is a good idea. Having a kubeconfig on the hub cluster is an easy way to start, but not a great end state. That doesn't mean it should be disallowed, but perhaps brainstorming a bit on what a more desireable end state is would be helpful.

A potentially better end state could be a spoke-cluster-bound-SA-token service that accepts a hub cluster bound SA token, spoke cluster, and desired component. This service could then retrieve a bound SA token for the spoke cluster and return it to the component in the hub cluster. In this situation, the only component with long lived credentials would be the spoke-cluster-bound-SA-token service. While not perfect, every other component would have no long lived credentials on the hub cluster. This approach

  1. limits the number of secrets, improving scalability
  2. limits the number of long lived credentials in the hub cluster, improving security
  3. makes it practical for the spoke-cluster-bound-SA-token service to use an off-cluster volume driver to store its long lived (or potentially not) credentials, which would be impractical to ask every component to do.

deads2k avatar Oct 15 '24 21:10 deads2k

Here is another alternative.

To securely manage credentials for each cluster via ClusterProfile, we could generate a scope-limited token for each spoke cluster. These restricted tokens can only be used to request higher-privilege tokens via the TokenRequest API. Whether these higher-privilege tokens are granted is controlled by the spoke cluster itself (or by webhooks). Additionally, users or consumers are responsible for storing these higher-privilege tokens securely.

This helps enhance security by minimizing credential exposure, offers flexibility through dynamic access control, and simplifies multi-cluster credential management.

dixudx avatar Oct 17 '24 03:10 dixudx

@mikeshng: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Oct 30 '24 13:10 k8s-ci-robot

lgtm, I think we are in the correct direction.

qiujian16 avatar Oct 31 '24 02:10 qiujian16

A couple of minor points left to address, but this is looking good, thanks @ryanzhang-oss! Could you squash the commits before we merge this?

skitt avatar Nov 04 '24 10:11 skitt

Thanks!

/approve

Technically this still needs an LGTM from a member, @mikemorris would you care to do the honours, assuming the PR meets with your approval?

skitt avatar Nov 05 '24 10:11 skitt

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: corentone, mikeshng, ryanzhang-oss, skitt, zhiying-lin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Nov 05 '24 10:11 k8s-ci-robot

Happy with how this has evolved, thanks for the patience incorporating feedback!

/lgtm

mikemorris avatar Nov 05 '24 22:11 mikemorris