argo-cd Cluster names cannot be used via RBAC

If you are trying to resolve an environment-specific issue or have a one-off question about the edge case that does not require a feature then please consider asking a question in argocd slack channel.

Relevant slack thread for issue with @jannfis

Checklist:

[x] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
[x] I've included steps to reproduce the bug.
[x] I've pasted the output of argocd version.

Describe the bug The RBAC policy for clusters still require you to use a URL, when names are now a convention.

To Reproduce

Create a policy like below

        p, testaccount, clusters, get, in-cluster, allow

Check to see if testaccount is able to see the cluster - it should not be able to. Change the policy to the url syntax:

        p, testaccount, clusters, get, https://kubernetes.default.svc, allow

validate that testaccount is now able to see the cluster

Expected behavior

The ability to interchangeably be able to use names and URLs for cluster reference in RBAC

Screenshots

Would just be redundant.

Version

argocd: v1.7.6+b04c25e
  BuildDate: 2020-09-19T00:50:44Z
  GitCommit: b04c25eca8f1660359e325acd4be5338719e59a0
  GitTreeState: clean
  GoVersion: go1.14.1
  Compiler: gc
  Platform: linux/amd64
argocd-server: v1.7.8+ef5010c
  BuildDate: 2020-10-15T22:34:12Z
  GitCommit: ef5010c3a0b5e027fd642732d03c5b0391b1e574
  GitTreeState: clean
  GoVersion: go1.14.1
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: {Version:kustomize/v3.6.1 GitCommit:c97fa946d576eb6ed559f17f2ac43b3b5a8d5dbd BuildDate:2020-05-27T20:47:35Z GoOs:linux GoArch:amd64}
  Helm Version: version.BuildInfo{Version:"v3.2.0", GitCommit:"e11b7ce3b12db2941e90399e874513fbd24bcb71", GitTreeState:"clean", GoVersion:"go1.13.10"}
  Kubectl Version: v1.17.8

Oct 29 '20 22:10 dirtycajunrice

I am taking a look at this, pretty familiar with cluster names and urls.

Oct 31 '20 08:10 lcostea

I don't know what kind of rules the user will define for cluster access, he can define all with cluster url, all with cluster name or a mix. So my initial idea was to apply the enforce on both url and name and then if one gives a deny then it is not allowed (no matter what the other result would be). But there are scenarios for which it doesn't work, like this one:

we have a cluster minikube/http://192.168.0.1
the only rule for user alice is: p, alice, clusters, get, minikube, allow and this rule should allow the user to see the minikube cluster
when I enforce on cluster url I will get deny, when I enforce on cluster name I will get allow, so based on my logic I should deny this, which contradicts the rule intention

So I could go with the idea that if I find an allow, then I will allow the action. But then I can have policies like below which will not work ok:

p, alice, clusters, get, *, allow
p, alice, clusters, get, minikube, deny

On the above alice should not see the minikube cluster, but she will be able to because I found an allow rule.

I guess it is somehow the more specific rules have a bigger weight. I see casbin has some API to check for specific rules (https://casbin.org/docs/en/management-api#hasnamedpolicy), but I am not sure this is the right direction, it seems to get more complex to build something to extend enforce.

Any other ideas?

Nov 08 '20 18:11 lcostea

We actually build all of our oidc group scopes to where we allow cluster by group. When the regex implementation is decided as a replacement for glob, this should address your issue inherently. Until then we are just being verbosely specific. Will be easier with just names too.

Nov 09 '20 03:11 dirtycajunrice

This is related to the action we are trying to validate, not the policies. The action get cluster can be validated on cluster url or cluster name and they can have different results. The question is which one will we chose as correct? I can't find a good balance until now. I don't see how regex introduction is going to help, this will modify the policies, not the action.

Nov 09 '20 08:11 lcostea

So I could go with the idea that if I find an allow, then I will allow the action. But then I can have policies like below which will not work ok:
p, alice, clusters, get, *, allow
p, alice, clusters, get, minikube, deny
On the above alice should not see the minikube cluster, but she will be able to because I found an allow rule.

I think this is due to the Casbin model we use. Currently, we are using allow-and-deny, but for a policy such as above, we should migrate to deny-override model.

Nov 09 '20 17:11 jannfis

@jannfis I still don't think it is related to the policies For example the line below checks Get action for Clusters on q.Server (which is the cluster url). I wanted to check also on q.Name, but then I will have 2 results that might contradict each other. I can't find a good logic to combine the 2 results https://github.com/argoproj/argo-cd/blob/master/server/cluster/cluster.go#L107

Nov 09 '20 18:11 lcostea

I think it would be a bad policy to deny https://kubernetes.default.svc but allow in-cluster (or vice versa).

So can we safely assume the following?

errSrv := s.enf.EnforceErr(ctx.Value("claims"), rbacpolicy.ResourceClusters, rbacpolicy.ActionGet, q.Server)
errNm := s.enf.EnforceErr(ctx.Value("claims"), rbacpolicy.ResourceClusters, rbacpolicy.ActionGet, q.Name)
if errSrv != nil && errNm != nil {
   return nil, errSrv
}
// we had at least one match, either on server or name
...

Nov 09 '20 18:11 jannfis

In such a case if you want to deny production cluster only and allow access to all others what would be the policy?

The below will give allow on url and deny on name, so the final result will be allow, but I think the user wanted no access for production (no matter if url or name was used).

p, alice, clusters, get, *, allow
p, alice, clusters, get, production, deny

So I think we will end up using both url and name to deny.

Nov 09 '20 18:11 lcostea

deny rules won't override an existing allow rule with our current RBAC model.

This is what I was trying to say in https://github.com/argoproj/argo-cd/issues/4712#issuecomment-724176472

Nov 09 '20 18:11 jannfis

I made a unit test specific for this case, with the current rbac logic.

Policy will be

p, mike, clusters, get, *, allow
p, mike, clusters, get, https://192.168.0.1, deny

And I will query for cluster

clusterapi.ClusterQuery{
		Name:   "minikube",
		Server: "https://192.168.0.1",
	}

The enforce will fail. You can see the test here: https://github.com/lcostea/argo-cd/blob/lcostea/rbac_cluster_names/server/cluster/cluster_test.go#L149

Nov 09 '20 19:11 lcostea

So the require.NoError(t, err) does fail?

That is weird. I just tested the following policy with two apps in the cluster, guestbook and guestbook2 (policy.default is set to empty)

    p, role:user, applications, get, *, deny
    p, role:user, applications, get, guestbook, allow
    g, test, role:user

Logged in as local user test:

$ ./dist/argocd app list
NAME  CLUSTER  NAMESPACE  PROJECT  STATUS  HEALTH  SYNCPOLICY  CONDITIONS  REPO  PATH  TARGET
$ ./dist/argocd app get guestbook
FATA[0000] rpc error: code = PermissionDenied desc = permission denied: applications, get, default/guestbook, sub: test, iat: 2020-11-09T20:22:29+01:00

Likewise, when I change policy to

    p, role:user, applications, get, *, allow
    p, role:user, applications, get, guestbook, deny
    g, test, role:user

the outcome is:

./dist/argocd app list
NAME        CLUSTER                         NAMESPACE   PROJECT  STATUS     HEALTH   SYNCPOLICY  CONDITIONS  REPO                                             PATH       TARGET
guestbook   https://kubernetes.default.svc  guestbook   default  OutOfSync  Missing  <none>      <none>      https://github.com/argoproj/argocd-example-apps  guestbook  HEAD
guestbook2  https://kubernetes.default.svc  guestbook2  gpg      OutOfSync  Missing  <none>      <none>      https://github.com/argoproj/argocd-example-apps  guestbook  HEAD
$ ./dist/argocd app get guestbook
Name:               guestbook
Project:            default
Server:             https://kubernetes.default.svc
....

Nov 09 '20 19:11 jannfis

Yes, it fails. But if you say the above scenario with applications is correct I will check again the code, maybe I am missing some initialisations on rbac. Thanks for the help.

Nov 09 '20 19:11 lcostea

Sorry, my fault. I used wrong format for the policy. Please ignore everything I wrote about allow/deny.

Nov 09 '20 20:11 jannfis

Just ran into this since I couldn't use cluster names in declaratively defined AppProjects (though I can on Applications).

Aug 01 '22 22:08 sidewinder12s

Just ran into this since I couldn't use cluster names in declaratively defined AppProjects (though I can on Applications).

Almost the same here - I want to allow OIDC groups be able to see just the clusters they can use, but I don´t have the URL at this point (and using AWS load balanacers, the cluster name is not part of the URL, so regex can´t help). Referencing clusters by name in RBAC policies would really come in handy :) Any news on this?

Jul 03 '23 12:07 TTRCmedia

Can confirm this is present on v2.7.6. It only applies when restricting access to clusters in the RBAC policy. It does not apply to projects and applications.

E.g.:

p, role:my-team, clusters, get, my-tenant-cluster, allow ❌ 
p, role:my-team, projects, get, my-project, allow ✅
p, role:my-team, applications, get, my-project/*, allow ✅

Can also confirm that using cluster names as destination in Application and AppProject manifests does work ok.

Jul 17 '23 18:07 davidmontoyago

It happens on v2.8.2+dbdfc71 too, we had to pass the server instead of name to restrict cluster view!

Sep 28 '23 21:09 NathanAlcantara

Just ran into this since I couldn't use cluster names in declaratively defined AppProjects (though I can on Applications).

Almost the same here - I want to allow OIDC groups be able to see just the clusters they can use, but I don´t have the URL at this point (and using AWS load balanacers, the cluster name is not part of the URL, so regex can´t help). Referencing clusters by name in RBAC policies would really come in handy :) Any news on this?

Also an issue with AWS EKS as EKS provides you with a URL for k8s API.

From an intuition standpoint, I wasn't expecting the URL to be what it evaluated against and was slightly confused at first, though trying based on the URL was my first instinct otherwise.

This URL is randomly generated, so it's not really possible to know what it might be in advance nor is it really possible to regex/wildcard off something generated.

Simply wrapping a CNAME around that generated URL won't work either as the CN won't match thus certificate verification fails, though I'm wondering if that's override-able somehow. I only noticed this might be possible looking at https://github.com/oboukili/terraform-provider-argocd/blob/master/docs/resources/cluster.md#nested-schema-for-configtls_client_config

If that doesn't work than I'd have to run a proxy or something. And this is just my case, which is perhaps a bit of a subset of others who wish to evaluate by cluster name for whatever other reasons.

Edit: I was able to make this work; edited cluster secret to change URL to my CNAME, added serverName to the tlsClientConfig .. Helps my situation a bit but doesn't solve the original issue :)

Can we have this check against name instead or something or some method of doing both?

Of course I understand that just switching to name would have backwards compatibility issues and doing an OR on either could be unexpected by users not aware of the change and could result in a potentially looser role than intended, but in a similar vein, I was able to add preferred_username under data.scopes in the policy configmap to enable me to auth against usernames when using SSO w/Dex+GitLab.

Perhaps an option added there to choose which for clusters might work? data.useClusterName: [boolean] ??

Dec 08 '23 12:12 ForbiddenEra

argo-cd argo-cd copied to clipboard

Cluster names cannot be used via RBAC

argo-cd
argo-cd copied to clipboard