terraform-provider-kubernetes icon indicating copy to clipboard operation
terraform-provider-kubernetes copied to clipboard

Do not fail instantly on CRD resolution error when handling a manifest

Open booti386 opened this issue 4 years ago • 6 comments

Description

CRD resolution failure can happen e.g. when the service account does not have list access to the crd API. However, this should not prevent from trying to resolve the resource type using the cluster OpenAPI. This is fixed by moving the error check after the attempt with OpenAPI.

Acceptance tests

  • [ ] Have you added an acceptance test for the functionality being added?
  • [ ] Have you run the acceptance tests on this branch?

Output from acceptance testing:

$ make testacc TESTARGS='-run=TestAccXXX'

...

Release Note

Release note for CHANGELOG:

...

References

  • #1507

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

booti386 avatar Nov 23 '21 15:11 booti386

CLA assistant check
All committers have signed the CLA.

hashicorp-cla avatar Nov 23 '21 15:11 hashicorp-cla

Hi,

Thanks for raising this issue. I can see how permissions to list CRDs should not be required as long as one doesn't actually want to manage CRDs. However, the changes proposed in this PR will have unexpected side-effects when used on CRDs with a service account that doesn't have list permissions. CRDs are also included in the OpenAPI document, but their schema is imprecisely translated to OpenAPI v2. In that case, if I understand this correctly, the lookup for the CRD will fail, but then it would be found in the OpenAPI document and thus the provider would end up using the incomplete schema expressed in OpenAPI v2. This will cause the same resource to have different structure based on the permission of the calling identity. Something we obviously cannot have happen.

Unfortunately, even if we were to distinguish on error type returned on CRD lookup (to handle the missing permissions error specifically), we would still be missing the information whether the resource is a CRD or not. We cannot just continue and extract the OpenAPI entry for it if we don't know for certain it's not a CRD (for reasons mentioned above).

At this time, I don't really see a way forward without those permissions being available. Let me know if you have some other options in mind.

alexsomesan avatar Nov 23 '21 23:11 alexsomesan

Then the only thing I can think of is to add a flag to explicitly define if the manifest refers to a CRD or not. It will also have the advantage of making the manifest scheme more consistent. (I was a bit confused before understanding that it can effectively refer to both CRD and non-CRD resources).

booti386 avatar Nov 24 '21 20:11 booti386

The documentation for the kubernetes_manifest resource doesn't not mention it being specific to CRD. On the contrary, it includes an example of using it to create a ConfigMap as well as a CRD. We'll try to emphasize that more, to avoid this kind of confusion.

With regards to the flag you mentioned, I'm afraid that would degrade the user experience more than it would help. For example, we expect users to use automation tools like https://github.com/jrhouston/tfk8s to convert batches of vanilla Kubernetes manifests to Terraform and adding such a flag would complicate this process unnecessarily.

While I'm thinking of alternatives, would you mind explaining in a bit more detail while you're unable to set permissions to list the CRDs in the cluster? They're schema definitions and do not carry sensitive information usually. I'm trying to understand the concerns behind not allowing permissions to list them (read-only).

alexsomesan avatar Nov 25 '21 16:11 alexsomesan

The documentation for the kubernetes_manifest resource doesn't not mention it being specific to CRD. On the contrary, it includes an example of using it to create a ConfigMap as well as a CRD. We'll try to emphasize that more, to avoid this kind of confusion.

After prrofreading the doc I think it is actually ok, it was more the unexpected error message that caused confusion (it puzzled me for a while that I got an error message about CRD endpoint while I was not using it at all, until I read the source code).

With regards to the flag you mentioned, I'm afraid that would degrade the user experience more than it would help. For example, we expect users to use automation tools like https://github.com/jrhouston/tfk8s to convert batches of vanilla Kubernetes manifests to Terraform and adding such a flag would complicate this process unnecessarily.

What I wanted to suggest was to add a tristate e.g. use_crd with the following possible values:

  • unset/null, we get the current behavior
  • true, we always try to retrieve a crd
  • false, we always try to retrieve other things

This way it shouldn't break any existing code.

While I'm thinking of alternatives, would you mind explaining in a bit more detail while you're unable to set permissions to list the CRDs in the cluster? They're schema definitions and do not carry sensitive information usually. I'm trying to understand the concerns behind not allowing permissions to list them (read-only).

Some kubernetes clusters have very strict permissions set, and don't always expose CRDs for various reasons (mostly because the API endpoint is not exposed by default rbac configuration iirc).

booti386 avatar Nov 26 '21 14:11 booti386

So, what are most people doing here? I gave in and gave the serviceaccount access to the CRD API, read-only, with a ClusterRole and a ClusterRoleBinding.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: crd-readonly
rules:
  - apiGroups: ["apiextensions.k8s.io"]
    resources: ["customresourcedefinitions"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: whatever-crd-readonly
subjects:
  - kind: ServiceAccount
    name: whatever
    namespace: whatever
roleRef:
  kind: ClusterRole
  name: crd-readonly
  apiGroup: rbac.authorization.k8s.io

It does seem like it shouldn't be required (in this case, my manifest is for a PersistentVolumeClaim), but it also doesn't seem to be a huge problem unless you had some serious custom resources going on, on a cluster where this user cannot be trusted to read those definitions. Someone facing that would have to design around the restriction with separated clusters for their concerns.

mike-sol avatar Sep 28 '23 05:09 mike-sol