Fine-grained RBAC for Argo Server
Summary
The current model of RBAC leans heavily on Kubernetes RBAC, so is easy to make secure, but may not scale well.
Consider the situation where you have 1000s teams and 1000s namespace. At the very least you may need a OIDC group for each namespace, and then a service account, role and role binding.
It maybe better to use Casbin (or similar) to provide a more flexible way to configure this.
Use Cases
When would you use this?
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
I would like to add onto this. In our setup, each "team" is given a namespace, and each "role" within that team (developer, admin, etc) for that team within our SSO is mapped onto a k8s service account, and given a role and rolebinding to match. The issue arises when a user belongs to more than one team, there's no way to give them multiple service accounts, so the only options are to:
- Generate one service account per user (defeating the purpose of SSO)
- Generate one (service account,role,rolebinding triple) per possible combination of SSO roles (combinatorial explosion)
- Abandon SSO
- Disallow users from belonging to more than one team/role
I hope you'll agree none of these are acceptable outcomes. We agree that using k8s-native RBAC makes a lot of sense for a k8s-native app, so here are some potential fixes we've thought up:
First, consider adding an annotation to service accounts, that, if present, indicates that all service accounts that match the criteria are selected, instead of just the first according to priority. From there, each SA would be tried, by order of priority. If a "permission denied"-type error is received, the next would be tried. If the list is exhausted, if the request succeeds, or a non-"permission denied"-type error is received, the current behavior would take place. While this would multiply the time each request takes by the number of matched SA's, I think it would allow any arbitrary combination of roles (assuming one role per SA) across namespaces to be assigned to any given user.
Another possibility would be to add an annotation to service accounts to indicate that the service account only be considered if the request is for one of a set of namespaces, and the server would keep a cache of matched service accounts for each user, instead of just one, and select the most appropriate one for each request based on namespace and priority. While this would only allow one role (assuming one role per SA) per namespace, that seems like a much friendlier restriction than one SA per cluster, and the mapping of namespace -> SA could be done fairly efficiently.
Finally, the server could dynamically generate SA's on a per-session basis based on OIDC claims, and instead collect a set of (cluster)roles to bind to those generated SA's in the same way that SA's are currently selected (but allowing multiple roles to be selected). This might be more permissions than some admins would wish to give to argo, but would allow a clean mapping of SSO roles/groups to k8s roles.
Dear @alexec,
first of all, thanks a lot for that great piece of software you and the community is providing. We started to use it for some simple Workflows and now we're already looking into sophisticated ones with dozens of steps/tasks and dependencies. The Kubernetes native nature is just a cherry on top of its capabilities. GREAT WORK!
So I've now given credits to the Ferrari under the hood you've built. Nevertheless, to be able to roll it completely out in our productive Environments we're missing some UI RBAC Capabilities / Controls. As it's usual in Financial Industries and bigger regulatory dependent companies, there are audit and compliance rules to follow.
With respect to segregation of duties and separate ownership for different Workflows/Workflowtemplates, we need to cover following requirements:
- UI Should have Ready-Only Mode
- Kubernetes Objects like Workflow, WorkflowTemplates, Cron Workflows and so son should be immutable in the UI (Only Parameters and Metadata --> but no YAML EDIT)
- Logged in user should only see Workflows, which he's allowed to list (even in Cluster Install)
- It should be possible to segregate between admins, operators, viewer
ALL ROUTES LEAD TO ROME. There are multiple ways to incorporate these capabilities into Argo Workflows. Here 2 suggestions:
Option 1. Predefine Permission Matrix for UI Capabilities
As an example three predefined Permission Groups:
- Admin
- Operator
- Read-Only
Admin: Can fully use the current UI and its capabilities like Submit new Workflow, Edit Json/Yaml, Upload File, Delete, terminate and so on. Operator: Can submit, resubmit, terminate, delete workflows but no creation of custom resources like workflows, workflow templates, sensors and so on. Also no edit possibility of "full workflow options" --> YAML Read-only Read-Only: Can only view and list resources, which the logged in user is allowed to over its Kubernetes RBACs settings
Theses UI Permission Groups could be assigned over annotations directly on the Service-accounts. If none is assigned, then read-only could be set as default: eg. workflows.argoproj.io/rbac-ui-permission-group: "admin"
Pros:
- relatively easy to implement and rollout Cons:
- still no possibilty of fine-grained rbac settings on different resources
- no multi-namespace configuration possible, because User is always mapped to one service-account
Option 2. Use similar RBAC Configuration Approach like Argo CD and probably intended in your issue description
Definition of a ConfigMap with additional and fine-grained RBAC roles defined. Also a default role:readonly could be applied.
RBAC Permission Structure
Permission policy definition should be feasible for all resources or for namespace scoped (see next bullet)
p, <role/user/group>, <resource>, <action>, <object>
Namespace scoped Permission policy definition:
p, <role/user/group>, <resource>, <action>, <namespace>/<object>
RBAC Resources and Actions
Ideally the other Resources like events and sensors should also be covered, because they're also handled in the UI even if the CRD is not in argo-workflows itself:
clusterworkflowtemplates, cronworkflows, eventbus, eventsources, sensors, workfloweventbindings, workflows, workflowtemplates
Actions: get, create, delete, terminate, submit, edit
get= get/list resources create= create resources delete= delete resources terminate= terminate running resource submit= submit workflow with possible parameter settings (no YAML edit option) edit= possible to submit and edit yaml / workflow options
Here an example how such a policy could look like:
apiVersion: v1
kind: ConfigMap
metadata:
name: argoworkflow-rbac-cm
namespace: <yournamespace>
data:
policy.default: role:readonly
policy.csv: |
p, role:workflow-admin, clusterworkflowtemplates, *, *, allow
p, role:workflow-admin, cronworkflows, *, *, allow
p, role:workflow-admin, eventbus, *, *, allow
p, role:workflow-admin, eventsources, *, *, allow
p, role:workflow-admin, sensors, *, *, allow
p, role:workflow-admin, workfloweventbindings, *, *, allow
p, role:workflow-admin, workflows, *, *, allow
p, role:workflow-admin, workflowtemplates, *, *, allow
p, role:workflow-ops, workflows, get, *, allow
p, role:workflow-ops, workflows, delete, *, allow
p, role:workflow-ops, workflows, submit, *, allow
p, role:workflow-ops, workflows, terminate, *, allow
p, role:workflow-team-blue-scoped, workflows, *, targetnamespace-blue/*, allow
p, role:workflow-team-red-scoped, workflows, *, targetnamespace-red/*, allow
g, your-admin-group, role:workflow-admin
g, your-workflow-ops-group, role:workflow-ops
g, your-team-blue-scoped-group, role:workflow-team-blue-scoped
g, your-team-red-scoped-group, role:workflow-team-red-scoped
Pros:
- fine-grained permissions possible
- user can handle resources over x namespaces
Cons:
- implementation effort is probably higher for integrating this
The Option 1 is more to be seen as a "quick fix".
The Option 2 is definitely preferred and the bulletproof approach (as seen in Argo CD)
BTW: One setup which I don't understand completely. Why is there a dedicated namespaced install needed? If you have clear K8S RBAC settings (which you obviously have) and on top additional Groups/Policies, you could always install it as Cluster and let the RBAC rules and Policies do its job.
@HouseoLogy I can't don't have much to add to what you said. It is basically correct.
Today's solution is intentionally frugal. Basically we have service accounts that have annotations that select a service account for you to use. The upside is we lean on Kubernetes RBAC (less effort, especially on critical security code), the downside is that it does not scale well, ultimately requiring one service account for each user, and each user need to have they're permissions copied into that account.
There are other solutions, I've not mentioned:
If the problem is operational overhead, we could consider using impersonation. Basically, the users must create a service account (typically within their own namespace.). The argo-server service account could then use impersonation to become that uses. This moves the work from the operator to the user, and enables the user to self-serve.
We could look at using Kubernetes SSO. I don't know much about this, but it looks like it is not well supported.
Finally, we could have the option to use Casbin. I envisage this implemented as a HTTP interceptor, so the resources are the API URLs, not Kubernetes resources.
@HouseoLogy this issue is not currently on the core team's roadmap, so won't get looked at > 6 months. But, we do want to do more collaborative features, where someone from the community does the implementation with the guidance from the core team. Would you be interested?
I like @HouseoLogy 's idea of using ArgoCD's approach, we've use it extensively for more than a year with no complaints. If it isn't already, perhaps that behavior could be extracted into https://github.com/argoproj/pkg for shared, generic use.
@alexec Because this is such a blocker for us, I'd be happy to offer some time in implementing one of these solutions.
Thank you @andrewm-aero . I think we probably need to nail down whether or not "resources" are defined as "API URLs" or as "Kubernetes resources", I think this is a closed-door decision:
- The later is more work - it cannot be done as an interceptor. It must be implemented in code.
- It is more porous - it's hard to know which API endpoints change which resources.
- It will break - it is easy to change code functionality (e.g. to read a new resource) and forget to change the security enforcement.
But... it may not be what users want.
If we use "API URLs" then we may need to change some of those URLs to include namespace (for example).
@andrewm-aero have you managed to start the dev set-up?
Hello @alexec & @andrewm-aero, thanks a lot for the fast feedback and @andrewm-aero for offering time to implement one of these solution. I probably wouldn't be a huge help in this area, because GO isn't my domain and I'm currently overloaded with workload in my daily business. Nevertheless, I'll follow this issue and if I'm able to free up some time for contribution I'll ping you.
Like suggested, I would definitely go with the Casbin / Argo CD approach and not implement it as dedicated Kubernetes resources.
I think I might make a U-turn on saying we should do URLs:
- This cannot support clients using gRPC.
- This would mean we'd have to change the artifact and archive URLs.
- It's much nicer to be working with
verb+resource+namespace+name.
I've created a PoC. I do not plan to complete this. Would anyone like to volunteer to take this over the finish line?
Hey @alexec! I would like to offer some help by taking over your PoC. We would also greatly benefit from this feature at my company 👍
@jeanlouhallee I’m assuming you’ve done some Golang before? If so, step 1 is to read the CONTRIBUTING.md guide - clone the source code onto your local machine and checkout the casbin-poc Branch. You’ll need to start it using make start PROFILE=sso.
The branch has a number of TODO left on it which need completing. It’ll also need testing with a configmap volume mounted at /casbin.
We’ll want to get some community members to test it too. We can figure out the details closer to the time.
Thanks for the pointers @alexec. I have done a bit of Golang, but not much. Will learn a lot by diving into this.
Hi @jeanlouhallee thank you!
Hello @alexec, do you mind if I ping you directly on Slack for design questions/considerations?
Sure. That's fine.
Hi @jeanlouhallee & @alexec, were you able to agree on certain design considerations? Do you have any further insights?
We have added new feature related to SSO https://github.com/argoproj/argo-workflows/blob/master/docs/argo-server-sso.md#sso-rbac-namespace-delegation
@HouseoLogy @andrewm-aero Would this help in your use case
For all those watching, I have created PR #7193 that implements "SSO Impersonation" support in argo-server by using Kubernetes SubjectAccessReviews with the user's email or sub OIDC claim. This means that argo-server access can be managed by standard Kubernetes RoleBindings and ClusterRoleBindings with your user's email in the subjects.
NOTE: these changes apply equally to the aro UI and CLI, as they affect all K8S API calls made by argo-server of behalf of users
Amazing stuffs! Kiali already support impersonate with Kubernetes OIDC
I had a good experience when configuring kiali's authentication.
Usually, OIDC is already configured for authentication in Kubernetes Cluster.
Kiali to use the same App as Cluster authentication, It will map to the RoleBinding defined in the cluster. Thus, users can use the same privileges they can use in kubectl .
In my cases, there are hundreds of users in the cluster, so permissions are managed by scopes: groups of OIDC and RoleBinding is managed in the cluster.
You can set up with in kube-apiserver: --oidc-groups-claim
and Here is RoleBinding examples
subjects:
- kind: Group
name: "frontend-admins"
apiGroup: rbac.authorization.k8s.io
If Argo Workflow can follow this flow, it will be very helpful in using it.
I think that the method provided by the current Argo Workflow (v3.2) has management limitations, so I plan to operate it in client auth-mode.
@DingGGu as described in the "limitations" section of my PR https://github.com/argoproj/argo-workflows/pull/7193, the initial implementation won't support Group bindings, only User. This is because it uses SubjectAccessReviews, which require us to explicitly pass the list of Groups which a user is in, and I am not sure the best way to check that.
I see two possibilities:
- We run some kind of K8S query based on the User (
emailorsub), which checks what Groups that user is in (not good because it requires another K8S API call) - We add an alternate mode to the "impersonate" feature which extracts the
groupsOIDC claim, and runs aSubjectAccessReviewon thegroupsrather than justemailorsub.
@thesuperzapper Looking at your PR comments and the limitations of SubjectAccessReview, I wonder if this is the correct implementation.
My current configuration is as follows.
kube-apiserver
--oidc-issuer-url=https://<oidc.provider.com>
--oidc-username-claim=email
--oidc-groups-claim=groups
--oidc-groups-prefix='oidc:'
The JWT token for the proper authenticated to Kubernetes and payload is:
{
// ...
"email": "[email protected]",
"groups": ["ns-data-readonly", "ns-workflow-writer"],
// ...
}
The RoleBinding of the namespace is set as follows.
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: oidc:namespace-readonly
namespace: data
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: oidc:namespace-readonly
subjects:
- kind: Group
name: oidc:ns-data-readonly
apiGroup: rbac.authorization.k8s.io
In my use case, the group name of RoleBinding and the group name provided by OIDC are slightly different.
Note the oidc:ns-data-readonly in the RoleBinding and the ns-data-readonly of the OIDC groups.
This difference can be configured with --oidc-groups-claim in kube-apiserver.
In Kiali, even if prefix is set, permission is granted work properly.
They are using SelfSubjectAccessReviews not SubjectAccessReviews.
https://github.com/kiali/kiali/blob/fb20c789419ba0950cb6a3a5c5d296e1df778e58/kubernetes/kubernetes.go#L472-L508
Most users will not be able to reconfigure the Kubernetes API server (e.g. it is EKS, GCP auto-pilot, etc). How do these ideas work for them?
There will be many users in enterprise environments who do not have this option to change, regardless of whether it is available.
EKS and kOps I'm using have all been applied with in-place upgrade.
Why this option is useful is also explained in the EKS documentation.
This is to prevent the permission worker from accidentally granting permission starting with system: in the OIDC Provider.
My guess is that Kubernetes was also aware of this problem and provided a prefix option. So, there doesn't seem to be any need to rule out this option.
The reason why the cluster administrator wants to use "impersonate" is that to use another web console, there is no need to add or change the authorization scheme already configured in Kubernetes. which that means, User can use the same privileges granted to kubectl.
I manage hundreds of developers using dozens of clusters, and if this feature is added correctly, I think the management will be very efficient.
Also, other Cluster Administrstor will be happy to change the options of kube-apiserver if they know of these advantages.
Usually kube-apiserver consists of 3 Pods, so there is no difficulty to change it. (In my experience it is)
@DingGGu in general, I would strongly discourage the use of the Kubernetes "impersonate" feature, see my comments here about why https://github.com/argoproj/argo-workflows/pull/7193#issuecomment-964641307.
@thesuperzapper My "impersonate" literally means delegation. There is a connotation of the word "impersonate", but it is not meant to imply a deception.
SelfSubjectAccessReview performs authentication based on the user's token.
Even if you are a cluster administrator, if you do not have the user's token, you cannot impersonate someone else and execute privileges.
I think that the Argo Server executes commands that coded in the Argo Server through the token instead.
Depending on your point of view, you might think that Argo Server can run beyond the set privileges, but I think there is no way to know and import groups in advance unless it is the SelfSubjectAccessReview method.
Also, the cluster administrator should have a good understanding of what privileges (or k8s api) Argo Server performs.
@DingGGu I am specifically talking about kuberentes impersonation (not the concept of impersonation more generally), and the issue with it, is that if something is given access to the impersonate verb, it can impersonate literally any user, and therefore escalate its own permissions (which is why I don't want to give argo-server access to the impersonate verb).
Also, I think you misunderstand how the PR for SubjectAccessReview works, it is using the argo-server ServiceAccount to make SubjectAccessReview against the "username" from the user's JWT, it does this to verify that the caller is actually allowed to take the action, and if they are, it acts on their behalf to take that action.
@alexec thank you for involving me in this discussion, it's definitely interesting!
I totally agree with @HouseoLogy in his initial comment:
The Option 1 (Predefine Permission Matrix for UI Capabilities) is more to be seen as a "quick fix".
The Option 2 (Use similar RBAC Configuration Approach like Argo CD) is definitely preferred and the bulletproof approach (as seen in Argo CD)
The @thesuperzapper 's PR #7193 is good, but from a user perspective it is pretty complex and cumbersome compared to a solution based on Casbin.
I think that Argo Workflows RBAC leveraged in the UI should be completely separated and independant by Kubernetes. Especially because coupling to k8s APIs (and feature flags from what I understood) could limit a lot the usage of Argo Worfklows in on-prem solutions but also in AKS, EKS and GKE, where it's not immediate, easy and flexible to play with feature flags. Very often if you want to enable/disable such flags, you have to destroy and recreate an entire cluster, which is not a "nice option".
@jeanlouhallee and @alexec what's the status of the Casbin PoC?
@bygui86 Haven't made much progress on my side. I unfortunately had to reprioritize and work on other stuff. Even though I still have much interest in implementing Casbin similar to ArgoCD, I can't commit on any timeline for now.