flux2 icon indicating copy to clipboard operation
flux2 copied to clipboard

Allow impersonating service accounts in arbitrary namespace on remote clusters

Open schrej opened this issue 2 months ago • 20 comments

Context

We are using flux to deploy Kustomizations and HelmReleases to remote clusters by specifying a KubeConfig. Since the remote clusters are deployed using Cluster API, we use the cluster-admin Kubeconfig secret created by CAPI to do so. With spec.serviceAccountName we're able to impersonate ServiceAccounts in the remote cluster as long as the SA resides in a namespace with the same name as the namespace of the Kustomization/HelmRelease on the local cluster.

In our case, the ServiceAccount exists in a different namespace on the remote cluster than the Kust/HR on the local cluster. Therefore we need to specify the namespace.

I assume only allowing to specify a name, but not a namespace, was done due to security considerations, which I totally agree with when impersonating on the local cluster. We could create a custom Kubeconfig with the SA credentials, but that would be quite a lot of effort.

Request

Allow specifying a ServiceAccount namespace for Kustomizations and HelmReleases.

Things to consider:

  • maybe allow using spec.targetNamespace by setting a property
  • only allow it when a custom Kubeconfig is specified, since that allows circumventing any restrictions anyway
  • add a flag to controllers to enable this feature
  • allow to specify the full SA name (system:serviceaccount:<namespace>:<name>) to avoid a new property

schrej avatar Sep 29 '25 16:09 schrej

In a RoleBinding you can specify a ServiceAccount from a different namespace :)

matheuscscp avatar Sep 29 '25 17:09 matheuscscp

In a RoleBinding you can specify a ServiceAccount from a different namespace :)

Are you talking about a Kubernetes RBAC RoleBinding or did I miss something?

I'm not looking for a way to give a ServiceAccount appropriate permissions, I need to use a ServiceAccount in a specific namespace on a remote cluster.

For context: we're building a k8s platform and are implementing support for addons that are managed by 3rd-party teams. As part of the definition of an addon, a service account with appropriate permissions is specified. On the local cluster we're using one namespace per cluster we manage from it, e.g. cluster-dev1. When deploying an addon, we're creating an addon-<name> namespace on the remote cluster. Then we're creating the specified ServiceAccount there. Now we need a way to use that ServiceAccount when deploying to the remote cluster, even though the Kust/HR resides in the cluster-dev1 namespace on the local cluster.

schrej avatar Sep 30 '25 10:09 schrej

We will most likely not implement support for impersonating a ServiceAccount in a different namespace, so I'll lay out here all the alternatives. (We will not add this change to the Flux APIs as it would seriously deviate from what Kubernetes does, sorry!)

Alternative using impersonation (.spec.serviceAccountName)

In this case, it's possible to grant all the RBAC permissions to the ServiceAccount on the remote cluster by creating RoleBinding objects and binding them to the ServiceAccount. The namespace of the RoleBinding object and the namespace of the ServiceAccount object do not need to be the same, Kubernetes allows granting cross-namespace namespaced RBAC.

Alternative 1 not using impersonation (not very recommended, as it involves a long-lived credential)

You can create a long-lived token for the ServiceAccount on the remote cluster like this:

apiVersion: v1
kind: Secret
metadata:
  name: long-lived-token
  namespace: the-ns-of-the-serviceaccount
  annotations:
    kubernetes.io/service-account.name: the-name-of-the-serviceaccount
type: kubernetes.io/service-account-token

This token does not expire. You can then create a kubeconfig with this token and use it with .spec.kubeConfig.secretRef.

Alternative 2 also not using impersonation (recommended)

You can use the secret-less authentication feature for remote clusters that we (literally) just released in Flux 2.7:

  • kustomize-controller: https://fluxcd.io/flux/components/kustomize/kustomizations/#secret-less-authentication
  • helm-controller: https://fluxcd.io/flux/components/helm/helmreleases/#secret-less-authentication

This feature supports 4 providers:

  • aws for EKS
  • azure for AKS
  • gcp for GKE
  • generic for https://kubernetes.io/docs/reference/access-authn-authz/authentication/#configuring-the-api-server

With the generic provider you can map the claims of the token issued for the ServiceAccount on the local/hub cluster to a username on the remote/spoke cluster. This username can be the ServiceAccount you want on the remote/spoke cluster, e.g. system:serviceaccount:<namespace>:<name>.

matheuscscp avatar Sep 30 '25 11:09 matheuscscp

Thanks for the detailed response. The second alternative looks interesting, and I wasn't even aware that was possible. I'd improve the documentation around it a bit though. If you're unfamiliar with the topic, it's not obvious how this is supposed to work with the generic provider and cross-cluster SA token validation (it took me a while at least). It does require a substantial configuration (& architecture) change, so I'm not sure whether this will be a viable approach for us.

I would still prefer impersonation. I do understand the concerns about impersonating a SA in a different namespace, as namespaces are commonly used for access management. Not being able to use SAs in different namespaces absolutely makes sense within a single cluster. But when interacting with a remote cluster using a Kubeconfig, impersonation should be limited by the permissions of the user in the Kubeconfig, and not be restricted to a namespace with the same name on a completely different cluster.

There are other options for us to work around this limitation, like creating matching namespaces on the local/hub cluster and creating the Kustomizations there. But that requires syncing Kubeconfig secrets and scatters resources, which isn't great.

We will probably go with long-lived credentials for now, since that is the easiest option.

I would be happy to help with implementation of this feature, if you accept it.

schrej avatar Oct 01 '25 11:10 schrej

I'm also in favor of having the option to specify the namespace or even use the notation system:serviceaccount:<namespace>:<service account name>.

A use-case would be for tenancy with capsule. All serviceAccounts for tenant users are managed in the same namspace (e.g. tenant-system) and added as owners in the corresponding Tenant. Additionally all serviceAccounts can be added as tenant userGroups: https://projectcapsule.dev/docs/tenants/permissions/#group-scope

adberger avatar Oct 01 '25 11:10 adberger

We could do it like this: we introduce .spec.serviceAccountNamespace for Kustomization and HelmRelease. This field can only be used when .spec.kubeConfig is also specified. This feature should also be placed behind the gate EnableServiceAccountNamespace, which should be opt-in (forever, never opt-out).

I would accept this feature (a bit reluctantly), but this is a very sensitive matter and other maintainers (@stefanprodan, @stealthybox) need to be convinced as well.

matheuscscp avatar Oct 01 '25 11:10 matheuscscp

I'd also consider .spec.impersonation.username and .spec.impersonation.groups.

matheuscscp avatar Oct 01 '25 11:10 matheuscscp

@matheuscscp You mind explaining why only when .spec.kubeConfig is set (not wanting to sound accusatory)? I really like the fact that new features are well considered in Flux tbh.

adberger avatar Oct 01 '25 12:10 adberger

@matheuscscp You mind explaining why only when .spec.kubeConfig is set (not wanting to sound accusatory)?

The use case I'm seeing here that (kinda) makes sense to me is when you're applying things in a remote cluster. I can see how mirroring namespaces across the hub and spoke clusters may sound a bit unneeded/verbose.

Same cluster should remain like Pod.spec.serviceAccountName. Pods can't use a ServiceAccount from a different namespace, why should Kustomizations and HelmReleases be able to do so?

matheuscscp avatar Oct 01 '25 13:10 matheuscscp

@matheuscscp You mind explaining why only when .spec.kubeConfig is set (not wanting to sound accusatory)?

The use case I'm seeing here that (kinda) makes sense to me is when you're applying things in a remote cluster. I can see how mirroring namespaces across the hub and spoke clusters may sound a bit unneeded/verbose.

Same cluster should remain like Pod.spec.serviceAccountName. Pods can't use a ServiceAccount from a different namespace, why should Kustomizations and HelmReleases be able to do so?

I see, so this is more like a design decision rather than a technical impediment.

adberger avatar Oct 01 '25 15:10 adberger

One of the rules in Flux API design is that we don't go against Kubernetes namespaced object design. In Kubernetes a Pod can't run with an SA in different namespace nor can it mount a Secret/ConfigMap from a different namespace. I'm not for breaking this rule.

impersonation should be limited by the permissions of the user in the Kubeconfig

Really curious how are you achieving this, are you generating a Kubeconfig for each user and a dedicated role binding for verb: impersonate in the target cluster per user with the SA hardcoded? Can you please post here the RBAC you are using?

stefanprodan avatar Oct 01 '25 16:10 stefanprodan

impersonation should be limited by the permissions of the user in the Kubeconfig

Really curious how are you achieving this, are you generating a Kubeconfig for each user and a dedicated role binding for verb: impersonate in the target cluster per user with the SA hardcoded? Can you please post here the RBAC you are using?

We aren't. For us it's the admin Kubeconfig created by CAPI. In my opinion the namespace restrictions make less sense for cross-cluster interaction. It's not the same namespace, as it's on two separate clusters. If you want to impose restrictions on what can be done on the remote cluster, those restrictions should be enforced by the Kubeconfig user's permission, not by namespaces. The Kubeconfig should be (and is) required to be in the same namespace. But If you can impersonate SAs in different namespaces with that Kubeconfig, that should be allowed in my opinion. After all, if you don't specify a SA, you'll have all of the Kubeconfigs permissions as well. Users that still want to rely on the current behavior can just keep the feature disabled.

More context: In our management (local/hub) clusters we primarily use namespaces for organization, not to restrict access. Our tenant clusters (remote/spoke) have per-cluster tenancy. Still, namespaces are partly used for restrictions to separate platform and tenant components running there. We now want to introduce components that are provided as a service by separate teams, but are deployed to the tenant's cluster from our management clusters. Since we don't trust those teams with full cluster-admin access, we allow them to deploy a service account with their necessary rbac rules (which we review), and then impersonate that service account when applying their HelmReleases and Kustomizations. On the tenant cluster, that service account and the deployments should live in a addon-<name> namespace, while everything related to one cluster (CAPI, flux, our own CRs) is in c-<cluster name> on the management cluster.

We're running on-prem, not in one of the big clouds, so Workload Identity seems like quite a bit of extra effort. And unnecessary, considering we already have the Kubeconfig available.

schrej avatar Oct 02 '25 09:10 schrej

We're running on-prem, not in one of the big clouds, so Workload Identity seems like quite a bit of extra effort. And unnecessary, considering we already have the Kubeconfig available.

It's a security improvement to reduce the amount of long-lived credentials. Sure, long-lived credentials are always easier to set up, usually in any environment. Yes, security hardening often has extra costs.

matheuscscp avatar Oct 02 '25 11:10 matheuscscp

It's a security improvement to reduce the amount of long-lived credentials. Sure, long-lived credentials are always easier to set up, usually in any environment. Yes, security hardening often has extra costs.

I agree. But as long as Cluster API doesn't fully adopt it, we'll have the Kubeconfig anyway.

schrej avatar Oct 02 '25 15:10 schrej

@schrej, thanks for explaining your use-case with personas and namespace examples. I don't think Flux's current namespace matching behavior is preventing you from accomplishing what you need, and I think there are some considerations you should be careful about as a platform owner with where these ServiceAccounts live in the target cluster.

On the tenant cluster, that service account and the deployments should live in a addon- namespace, while everything related to one cluster (CAPI, flux, our own CRs) is in c- on the management cluster.

I don't recommend you put a ServiceAccount that can manage the CRD's/Pods/NetworkPolicy/Services/Certificates of an addon in the same namespace that the addon Pods are deployed. Any Pod in that namespace, created with that serviceAccountName would have privilege escalation to manage itself. This is typically never desirable.

When you have the service-owner request permissions to manage their addon, just ask them to provide you the necessary Roles, namespaced ClusterRoles, and cluster-wide ClusterRoles. You can then RoleBind to those Roles/ns-ClusterRoles from ns/addon-name to the hub-cluster SA from ns/c-cluster for namespaced permissions and ClusterRoleBind to ClusterRoles for cluster-wide ones.

Another way to think about this is to ignore their requested SA and replace the subjects in their requested bindings with your hub cluster's SA for the appropriate c-cluster namespace. This is a strong assertion that "this namespace is managed with privilege by something else entirely."

You should be able to patch your RBAC for that cluster specifically to match the cluster's name if you are already using separate folders to manage each cluster.

This is a practical security boundary. Matching the impersonated target-namespace to the hub-namespace prevents a different hub-namespace from using unintended permissions accidentally or maliciously. If this becomes trivially overridable, Flux loses the capability to delegate namespaces to different untrusted tenants, whereas keeping this mechanism doesn't inhibit target cluster owners from RoleBinding to the appropriate hub-namespace SA.

But If you can impersonate SAs in different namespaces with that Kubeconfig, that should be allowed in my opinion. After all, if you don't specify a SA, you'll have all of the Kubeconfigs permissions as well.

Cluster admins can currently prevent using the admin-kubeconfig by defaulting the SA name at the controller level.

I have brainstormed with the ClusterAPI folks about all sorts of KubeConfig features, but the current state of things is that most people's deployments have admin kubeConfigs all over the place that need to be shared with many teams. There is very little standardized api support to get details about the new Cluster's apiserver address/etc within ClusterAPI. The result is that most people operating ClusterAPI have no automated generation of least-privilege kubeconfigs. This seems to be the case for you as well:

We could create a custom Kubeconfig with the SA credentials, but that would be quite a lot of effort.

Flux's impersonation namespace-matching gives platform owners the ability to delegate and drop permissions to different folders/sources despite this unfortunate ecosystem reality. We need to be very careful with any changes to this code-path. Many people rely on it.

I encourage you to try RoleBinding from the tenant cluster's ns/addon to an SA in ns/c-cluster-name and see if that works for you to provide explicit, least-privilege access. You may need to patch the service-owner rolebindings per-cluster in your layout, but that is a small cost to ensure that other tenants are not managing each other's workloads.

stealthybox avatar Oct 02 '25 19:10 stealthybox

@adberger the Capsule use-case for Tenants is very cool -- are you mainly a contributor to Capsule or an end-user of the project? Capsule has an awesome integration with Flux where the security considerations are documented for the project's intended use-cases: https://projectcapsule.dev/docs/guides/use-fluxcd/

I didn't write that doc, and I have not read it fully, but they look to intentionally consume and integrate with our security model and API decisions.

Their support for x509 kubeConfigs works well with Flux, and they allow access to Tenants via groups which can be claimed via OIDC and our support for workload-identity.

This would also be consumable by the proposed arbitrary impersonation strings, but that would subvert Flux's current namespace promises.

stealthybox avatar Oct 02 '25 20:10 stealthybox

@adberger the Capsule use-case for Tenants is very cool -- are you mainly a contributor to Capsule or an end-user of the project? Capsule has an awesome integration with Flux where the security considerations are documented for the project's intended use-cases: https://projectcapsule.dev/docs/guides/use-fluxcd/

I didn't write that doc, and I have not read it fully, but they look to intentionally consume and integrate with our security model and API decisions.

Their support for x509 kubeConfigs works well with Flux, and they allow access to Tenants via groups which can be claimed via OIDC and our support for workload-identity.

This would also be consumable by the proposed arbitrary impersonation strings, but that would subvert Flux's current namespace promises.

@stealthybox Thanks for the input. We'll look into that and write our findings here.

adberger avatar Oct 09 '25 09:10 adberger

We took some time to discuss using Workload Identity and I did a small PoC to determine what's necessary to make this work in our setup. We haven't made a decision yet, but I wanted to share our considerations.

With the generic provider you can map the claims of the token issued for the ServiceAccount on the local/hub cluster to a username on the remote/spoke cluster. This username can be the ServiceAccount you want on the remote/spoke cluster, e.g. system:serviceaccount:<namespace>:<name>.

I assume the mentioned mapping is configured in the AuthenticationConfiguration of the apiserver? Or did I miss another way to do it? Mapping to a service account feels like a bad idea, as it could lead to unintentionally 'impersonating' service accounts. But we could strip the namespace and use unique spoke-cluster-wide names and bind to those (e.g. hub-cluster:<sa-name>). We still need to decide whether we're willing to set up 'workload identity' just for this.

If this becomes trivially overridable, Flux loses the capability to delegate namespaces to different untrusted tenants, whereas keeping this mechanism doesn't inhibit target cluster owners from RoleBinding to the appropriate hub-namespace SA.

That's I was suggesting to enable this with a global configuration option. Having this restriction makes sense when untrusted tenants are allowed to configure resources on hub clusters. This is not the case for us though. We are the only ones that configure flux on the hub cluster. We have full control over the flux resources, but we do not control the artifacts that are deployed with that flux configuration. By impersonating a SA with limited permissions, we only need to approve the rbac configuration, but not the rest of the deployment. We can work around this restriction, but it forces us to create a mess, either on the hub or on the spoke cluster:

  • we could create all flux resources in a addon-<name> namespace on the hub, which would allow to have the SA in the correct namespace on the spoke cluster, but makes it annoying to find all flux resources for a spoke cluster on the hub
  • we could create a c-<cluster-name> namespace on the spoke cluster, which makes keeps the hub cluster tidy, but requires an unnecessary reserved namespace on the spoke cluster just to create SAs.

I have brainstormed with the ClusterAPI folks about all sorts of KubeConfig features, but the current state of things is that most people's deployments have admin kubeConfigs all over the place that need to be shared with many teams. There is very little standardized api support to get details about the new Cluster's apiserver address/etc within ClusterAPI. The result is that most people operating ClusterAPI have no automated generation of least-privilege kubeconfigs. This seems to be the case for you as well:

No. In our case the admin KubeConfigs only exist in the management clusters and are used by Flux and Cluster API. Human users use OIDC to authenticate against the clusters. We have tools to help configure it for them. Most things that are applied to remote clusters by flux are created by us, so we are fine with using admin KubeConfigs for it. But in this case other teams create the manifests that are deployed, and we need to ensure they are not overstepping their boundaries with their deployments.

schrej avatar Oct 27 '25 14:10 schrej

Maybe this feature could help using secret-less access:

https://github.com/controlplaneio-fluxcd/flux-operator/issues/463

matheuscscp avatar Oct 27 '25 15:10 matheuscscp

@adberger the Capsule use-case for Tenants is very cool -- are you mainly a contributor to Capsule or an end-user of the project? Capsule has an awesome integration with Flux where the security considerations are documented for the project's intended use-cases: https://projectcapsule.dev/docs/guides/use-fluxcd/ I didn't write that doc, and I have not read it fully, but they look to intentionally consume and integrate with our security model and API decisions. Their support for x509 kubeConfigs works well with Flux, and they allow access to Tenants via groups which can be claimed via OIDC and our support for workload-identity. This would also be consumable by the proposed arbitrary impersonation strings, but that would subvert Flux's current namespace promises.

@stealthybox Thanks for the input. We'll look into that and write our findings here.

@stealthybox The proposed solution with https://github.com/projectcapsule/capsule-addon-fluxcd isn't viable for us because of the following reasons:

  • ServiceAccounts still need to be added manually to Tenant object as owner
  • ServiceAccounts still need to be added manually to CapsuleConfiguration object as "users" or "groups" (system:serviceaccounts:cloud-service-system)

We currently settled with the following approach (imitating some of the capsule-addon-fluxcd features):

Capsule CapsuleConfiguration:

  • Add all serviceAccounts from the namespace tenants to userGroups

Kyverno ClusterPolicy:

  • Whenever a Capsule Tenant is created, create a ServiceAccount in the namespace tenants with a long-lived API token (https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#manually-create-a-long-lived-api-token-for-a-serviceaccount)
  • Whenever a Capsule Tenant is created, add the created ServiceAccount as owner
  • Generate a kubeconfig Secret out of the API Token Secret
  • Automatically inject the kubeconfig Secret to each Kustomization & HelmRelease (.spec.kubeConfig.secretRef) created

Kyverno ClusterPolicy or Capsule GlobalTenantResource:

  • Replicate the kubeconfig Secret to each tenant namespace

adberger avatar Nov 05 '25 12:11 adberger