terraform-provider-kubernetes
terraform-provider-kubernetes copied to clipboard
Do not fail instantly on CRD resolution error when handling a manifest
Description
CRD resolution failure can happen e.g. when the service account does not have list access to the crd API. However, this should not prevent from trying to resolve the resource type using the cluster OpenAPI. This is fixed by moving the error check after the attempt with OpenAPI.
Acceptance tests
- [ ] Have you added an acceptance test for the functionality being added?
- [ ] Have you run the acceptance tests on this branch?
Output from acceptance testing:
$ make testacc TESTARGS='-run=TestAccXXX'
...
Release Note
Release note for CHANGELOG:
...
References
- #1507
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Hi,
Thanks for raising this issue. I can see how permissions to list CRDs should not be required as long as one doesn't actually want to manage CRDs. However, the changes proposed in this PR will have unexpected side-effects when used on CRDs with a service account that doesn't have list permissions. CRDs are also included in the OpenAPI document, but their schema is imprecisely translated to OpenAPI v2. In that case, if I understand this correctly, the lookup for the CRD will fail, but then it would be found in the OpenAPI document and thus the provider would end up using the incomplete schema expressed in OpenAPI v2. This will cause the same resource to have different structure based on the permission of the calling identity. Something we obviously cannot have happen.
Unfortunately, even if we were to distinguish on error type returned on CRD lookup (to handle the missing permissions error specifically), we would still be missing the information whether the resource is a CRD or not. We cannot just continue and extract the OpenAPI entry for it if we don't know for certain it's not a CRD (for reasons mentioned above).
At this time, I don't really see a way forward without those permissions being available. Let me know if you have some other options in mind.
Then the only thing I can think of is to add a flag to explicitly define if the manifest refers to a CRD or not. It will also have the advantage of making the manifest scheme more consistent. (I was a bit confused before understanding that it can effectively refer to both CRD and non-CRD resources).
The documentation for the kubernetes_manifest resource doesn't not mention it being specific to CRD. On the contrary, it includes an example of using it to create a ConfigMap as well as a CRD. We'll try to emphasize that more, to avoid this kind of confusion.
With regards to the flag you mentioned, I'm afraid that would degrade the user experience more than it would help. For example, we expect users to use automation tools like https://github.com/jrhouston/tfk8s to convert batches of vanilla Kubernetes manifests to Terraform and adding such a flag would complicate this process unnecessarily.
While I'm thinking of alternatives, would you mind explaining in a bit more detail while you're unable to set permissions to list the CRDs in the cluster? They're schema definitions and do not carry sensitive information usually. I'm trying to understand the concerns behind not allowing permissions to list them (read-only).
The documentation for the
kubernetes_manifestresource doesn't not mention it being specific to CRD. On the contrary, it includes an example of using it to create a ConfigMap as well as a CRD. We'll try to emphasize that more, to avoid this kind of confusion.
After prrofreading the doc I think it is actually ok, it was more the unexpected error message that caused confusion (it puzzled me for a while that I got an error message about CRD endpoint while I was not using it at all, until I read the source code).
With regards to the flag you mentioned, I'm afraid that would degrade the user experience more than it would help. For example, we expect users to use automation tools like https://github.com/jrhouston/tfk8s to convert batches of vanilla Kubernetes manifests to Terraform and adding such a flag would complicate this process unnecessarily.
What I wanted to suggest was to add a tristate e.g. use_crd with the following possible values:
- unset/null, we get the current behavior
- true, we always try to retrieve a crd
- false, we always try to retrieve other things
This way it shouldn't break any existing code.
While I'm thinking of alternatives, would you mind explaining in a bit more detail while you're unable to set permissions to list the CRDs in the cluster? They're schema definitions and do not carry sensitive information usually. I'm trying to understand the concerns behind not allowing permissions to list them (read-only).
Some kubernetes clusters have very strict permissions set, and don't always expose CRDs for various reasons (mostly because the API endpoint is not exposed by default rbac configuration iirc).
So, what are most people doing here? I gave in and gave the serviceaccount access to the CRD API, read-only, with a ClusterRole and a ClusterRoleBinding.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: crd-readonly
rules:
- apiGroups: ["apiextensions.k8s.io"]
resources: ["customresourcedefinitions"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: whatever-crd-readonly
subjects:
- kind: ServiceAccount
name: whatever
namespace: whatever
roleRef:
kind: ClusterRole
name: crd-readonly
apiGroup: rbac.authorization.k8s.io
It does seem like it shouldn't be required (in this case, my manifest is for a PersistentVolumeClaim), but it also doesn't seem to be a huge problem unless you had some serious custom resources going on, on a cluster where this user cannot be trusted to read those definitions. Someone facing that would have to design around the restriction with separated clusters for their concerns.