rancher
rancher copied to clipboard
[BUG] Rancher RoleTemplate CRD does not validate roles properly
Rancher Server Setup
- Rancher version: 2.6.9
Describe the bug Rancher RoleTemplate CRD does not validate roles properly.
To Reproduce Create a new RoleTemplate CR such as:
apiVersion: management.cattle.io/v3 builtin: false context: project description: "Test" displayName: Test Project Role external: false hidden: false kind: RoleTemplate metadata: name: test-project-role projectCreatorDefault: false rules:
apiGroups: [] resources: namespaces verbs: create
Result Kubernetes accepts the CR and then Rancher begins failing to apply that Project (or Cluster) Role:
2022/11/30 22:46:14 [ERROR] error syncing 'ztiap-sts': handler namespace-auth: couldn't ensure roles: couldn't update clusterRole test-project-role: couldn't update clusterRole test-project-role: ClusterRole.rbac.authorization.k8s.io "test-project-role" is invalid: rules[0].apiGroups: Required value: resource rules must supply at least one api group, requeuing
I suspect that the Rancher UI is validating that the RoleTemplate is created properly. That validation should be moved into the Kubernetes API rather than the client.
Expected Result
Kubernetes rejects the CR creation because the list of rules
contains an invalid rule (the apiGroups is an empty list)
SURE-5702
Quoting a piece from SURE-5702 which did not make it here
I suspect that the Rancher UI is validating that the RoleTemplate is created properly. That validation should be moved into the Kubernetes API rather than the client.
The implication that the CR was created via kubectl
, and not the RUI, bypassing the suspected validation done by the UI.
Validation in the kube API :arrow_right: webhook. Grepping filenames for roletemplate delivers
./pkg/resources/management.cattle.io/v3/roletemplate/validator.go
as a possible starting point.
Which contains validateCreateFields
called from the validators Admit
.
Nothing in that code seems to look at the rules
.
:warning: The validateUpdateFields
does not seems to prevent editing of the rules of an RT, i.e. to empty apiGroups, or verbs. Although comments somewhere else seem to indicate that this would happen.
Reading more the verb check is in Admit
itself, and not the validate...
functions.
The verb check is done via CheckForVerbs
, in the common support for validators.
It seems to me that an analogous CheckForAPIGroups
should do what is wanted.
A bit more from SURE about reproduction
One note that wasn't on the issue - you need to bind a user to the impacted role template in a project or you won't see an error in the logs (I believe we don't attempt to create backing roles until someone actually has need of those rules).
And about possible solutions
As far as the solution goes, we might be able to get k8s to do the heavy lifting for us if we can get our CRDs to be well-defined, which would avoid custom webhook logic.
What I have done so far was going for the webhook and custom logic.
Researching into the CRD spec direction the CRD looks to be defined via
-
./pkg/crds/yaml/generated/management.cattle.io_roletemplates.yaml
and code under pkg/crds/...
installs these yaml files into the cluster on setup.
The files are apparently embedded into the rancher
binary.
The exact method is actually not relevant to repro and fix.
The files are apparently also generated from somewhere.
This will be relevant to the fix, as the sources would have to be modified.
The location of these sources is currently unknown :warning: :question:
The spec is based on openAPIV3Schema
.
- Starting at (googled: kubernetes crd spec) https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions
- It references (at the top) https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.29/#customresourcedefinition-v1-apiextensions-k8s-io
- Following field type references yields https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.29/#jsonschemaprops-v1-apiextensions-k8s-io
- The
minLength
field listed there may be what is needed to constrain the apiGroups property to length 1 and longer.
The existing apiGroups spec is
apiGroups:
description: APIGroups is the name of the APIGroup that contains
the resources. If multiple API groups are specified, any action
requested against one of the enumerated resources in any API group
will be allowed. "" represents the core API group and "*" represents
all API groups.
items:
type: string
type: array
It would have to be changed to
apiGroups:
__minLength: 1__
description: APIGroups is the name of the APIGroup that contains
the resources. If multiple API groups are specified, any action
requested against one of the enumerated resources in any API group
will be allowed. "" represents the core API group and "*" represents
all API groups.
items:
type: string
type: array
Notes:
- I wonder if the existing
checkForVerbs
code could be replaced by an analogous constraint on theverbs
array . - Same for the
resources
array.
Ok. Repro succesful:
2024/01/26 13:17:14 [ERROR] error syncing 'p-p2nxh/prtb-cz899': handler cluster-prtb-sync: couldn't ensure roles: couldn't create clusterRole test-gh-40584-role: ClusterRole.rbac.authorization.k8s.io "test-gh-40584-role" is invalid: rules[0].apiGroups: Required value: resource rules must supply at least one api group, requeuing
After figuring out how to bind a user to the impacted role template
:
UI:
- Create a test user.
- Go to
local
Cluster -Projects/Namespaces
-Create project
. As part of that -
Add the user
, and select the bogus role template as the users permissions. - Done
Fix with custom code (CheckForAPIGroups
analogous to CheckForVerbs
) is successful:
work@tagetarl:~/SUSE/dev/Rancher> k apply -f 40584-roletemplate.yaml Error from server (BadRequest): error when creating "40584-roletemplate.yaml": admission webhook "rancher.cattle.io.roletemplates.management.cattle.io" denied the request: policyRules must have at least one APIgroup: &PolicyRule{Verbs:[create],APIGroups:[],Resources:[namespaces],ResourceNames:[],NonResourceURLs:[],}
Fix with a change to the CRD is successful:
work@tagetarl:~/SUSE/dev/Rancher> k apply -f 40584-roletemplate.yaml kubectl apply -f 40584-roletemplate.yaml The RoleTemplate "test-gh-40584-role" is invalid: rules[0].apiGroups: Invalid value: 0: rules[0].apiGroups in body should have at least 1 items
Note, the minLength
mentioned above is the wrong constraint.
The actual constraint is minItems
.
With both possible fixes working I have questions:
- For one, which of the two is prefered ? a. Code change, or b. CRD change
- If the second, then there are open questions about the setup for the rancher CRDs: a. The CRD resides in pkg/crds/yaml/generated/management.cattle.io_roletemplates.yaml. b. As this file is apparently generated, where are the sources it was generated from ? c. And how would it be generated ? d. Note, depending on what the CRD yaml is generated from it may not be possible to set the desired constraint there. This would force us back to the code change in the webhook.
From comms in slack thread, by @MbolotSuse
- The source for the role template CRD can be found here: https://github.com/rancher/rancher/blob/abc87f5240fd6799b3f485106638f967373a736c/pkg/apis/management.cattle.io/v3/authz_types.go#L216
- After editing that file you should use go generate ./... to regenerate the CRDs (LMK if that doesn't work - you might also be able to use make build-crds for this purpose).
- I would recommend putting this check in the webhook.
- The API Groups check is not the only one performed by k8s.
- We want to make sure that we mirror all of the checks that k8s uses so that we don't see this issue surface in other ways (such as by providing no verbs, or no resources).
- The webhook is also a bit easier to test than CRD level validation.
- @Jonathan Crowther recently made a change for GlobalRoles which made sure those rules were valid according to upstream: https://github.com/rancher/webhook/blob/6d6130f5fe1f70db905a851225d4929855e6dcf2/pkg/resources/management.cattle.io/v3/globalrole/validator.go#L262. I would recommend implementing similar validation for your type.
Me:
Wrt resources it seems that the hook does not check that either. verbs was the only thing check AFAICT.
@MbolotSuse
Yep, I would expect the webhook check to miss a few validations on the rules right now. That's why I linked that code to the GlobalRole stuff - that function uses an upstream library which should match what k8s is looking for 1-1, so that should let you fix the webhook to get a lot without needing to get too deep into the finer details. (edited)
Me:
Ah :+1: Looking at the authz_types.go I see that the type for the rules property comes straight out of the kube rbac types, and these do not contain any constraints wrt minItems, so that explains why whatever tool called by go generate is not adding such to its output.
The global role validator changes came from
- https://github.com/rancher/webhook/commit/b18c38740a52623e44539f7b5ed6eb02c2f53306
- I.e. https://github.com/rancher/webhook/pull/309
Back to In Progress
. While the PR got the requisite approvals a comment pointed to integration tests not seen, which should be extended as well.
Reported an issue in v2.8 related to requeing error messages with missing annotations on a RoleTemplate.
[BUG] : Requeing error messages in Logs when user create RoleTemplates without annotations https://github.com/rancher/rancher/issues/45365
Observation 4: [BUG] : Wrong Inherit RoleTemplates throwing Internal Error exception from webhook https://github.com/rancher/rancher/issues/45371
Note: This is an existing issue
Observation 5 : [BUG] : Inherit RoleTemplates throwing Error exception from webhook without roletemplate name https://github.com/rancher/rancher/issues/45372
Completed validations on v2.9-head, no new issues reported and closing this issue.
v2.9-3c4ccdc5bc9fde3510089153b5ad58fdbe604880-head