pulumi-kubernetes icon indicating copy to clipboard operation
pulumi-kubernetes copied to clipboard

failed to determine if the following GVK is namespaced if CRD is created in the same run

Open ecerulm opened this issue 1 year ago • 1 comments

What happened?

I'm deploying cert-manager helm release + a ClusterIssuer resource

cert-manager helm release creates the ClusterIssuer CRD, but it seems that the pulumi.ConfigGroup that creates the ClusterIssuer resources tries to check some properties of the ClusterIssuer CRD before it's actually created although there is depends_on to the cert-manager.

    Exception: marshaling properties: awaiting input property "resources": failed to determine if the following GVK is namespaced: cert-manager.io/v1, Kind=ClusterIssuer

If I rerun pulumi up -y --skip-preview after then the ClusterIssuer will be created fine. That's why I think it's a timing issue between the creation of the ClusterIssuer CRD and the actual ClusterIssuer resource.

Example

cert_manager = helmv3.Release(
    "cert_manager",
    helmv3.ReleaseArgs(
        # https://cert-manager.io/docs/installation/helm/
        # https://artifacthub.io/packages/helm/cert-manager/cert-manager
        # https://github.com/cert-manager/cert-manager
        name="cert-manager",
        chart="cert-manager",
        namespace=namespace.id,
        version="1.15.2",
        repository_opts=helmv3.RepositoryOptsArgs(
            repo="https://charts.jetstack.io",
        ),
        values=cert_manager_helm_values,
    ),
    opts=pulumi.ResourceOptions(
        provider=kubernetes_provider,
    ),
)


def generate_clusterissuer_manifest(name, server):
    def func(args):
        template = env.get_template("letsencrypt-clusterissuer.j2.yaml")
        rendered = template.render(
            domain_name=args["domain_name"],
            region=args["region"],
            zone_id=args["zone_id"],
            name=name,
            server=server,
        )
        return rendered

    return pulumi.Output.all(
        domain_name=domain_name,
        region=region,
        zone_id=zone.id,
    ).apply(func)


# https://www.pulumi.com/registry/packages/kubernetes/api-docs/yaml/configgroup/
letsencrypt_stagin_cluster_issuer_cg = kubernetes.yaml.v2.ConfigGroup(
    "letsencrypt-staging",
    yaml=generate_clusterissuer_manifest(
        name="letsencrypt-staging",
        server="https://acme-staging-v02.api.letsencrypt.org/directory",
    ),
    opts=pulumi.ResourceOptions(
        depends_on=[
            cert_manager,
        ],
        provider=kubernetes_provider,
    ),
)


Output of pulumi about

pulumi about
CLI          
Version      3.129.0
Go Version   go1.22.6
Go Compiler  gc

Plugins
KIND      NAME        VERSION
resource  aws         6.49.1
resource  eks         2.7.8
resource  kubernetes  4.17.1
language  python      unknown
resource  random      4.16.3

Host     
OS       darwin
Version  14.6.1
Arch     x86_64

This project is written in python: executable='/Users/xxx/git/pulumi-aws-ecerulm/venv/bin/python' version='3.12.5'


...

Backend        
Name           xxxxx
URL            file://~
User           xxxx
Organizations  
Token type     personal

Dependencies:
NAME           VERSION
Jinja2         3.1.4
pip            24.2
pulumi_eks     2.7.8
pulumi_random  4.16.3
setuptools     72.2.0
wheel          0.44.0

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

ecerulm avatar Aug 19 '24 13:08 ecerulm

The workaround I use now is to create the CRDs myself with another kubernetes.yaml.v2.ConfigFile with just the CRDs

# https://www.pulumi.com/registry/packages/kubernetes/api-docs/yaml/configfile/
crds = kubernetes.yaml.v2.ConfigFile(
    "letsencrypt-prod",
    file="files/cert-manager.crds.yaml",
    opts=pulumi.ResourceOptions(
        depends_on=[

        ],
        provider=kubernetes_provider,
    ),
) 

then I set installCRDs: false for the cert-manager helm values, and make the ClusterIssuer depend on the crds ConfigFile and cert-manager's Release

ecerulm avatar Aug 19 '24 14:08 ecerulm

I faced a similar issue with multiple Release, e.g Kyverno or Karpenter and after quite a lot of testing, here is my conclusion:

kubernetes.yaml.v2.ConfigFile does not seem to work when kubernetes.yaml.ConfigFile does.

A little test I run, trying to create the same resource but with a different provider (made sure there was no clash in the manifest):

Updating (dev):
     Type                                             Name                          Status            Info
     pulumi:pulumi:Stack                              pulumi-aetion-dev             **failed**        1 error
     └─ pulumi-project:k8s:core-components             core-components
 +      ├─ kubernetes:helm.sh/v3:Release              helm-kyverno                  created (67s)
 +      ├─ kubernetes:yaml:ConfigFile                 kyverno-sync-secret-crd       created
 +      │  └─ kubernetes:kyverno.io/v1:ClusterPolicy  sync-secrets                  created (3s)
 +      └─ kubernetes:yaml/v2:ConfigFile              kyverno-sync-secret-crd2       created

Diagnostics:
  pulumi:pulumi:Stack (pulumi-aetion-dev):
    error: kubernetes:yaml/v2:ConfigFile resource 'kyverno-sync-secret-crd2' has a problem: marshaling properties: awaiting input property "resources": failed to determine if the following GVK is namespaced: kyverno.io/v1, Kind=ClusterPolicy

No amount of waiting or anything like that would help, as a matter of fact CRDs are created very early with Helm, way before the Pods are rolled out and the ConfigFile provider try to apply a manifest (it's easy to check with kubectl while the program is running).

My suspicion lies here but I was not able to test it. The fact is, It always work on a second run.

Anyway, I'm not sure what's the difference between the provider v1 and v2 but I'll stick with v1 for now as it seems to be working just fine for what I'm doing.

btuffreau avatar Oct 30 '24 13:10 btuffreau

The basic requirement is that the CRD be definitely installed before any CRs that depends on it are registered. This requirement is usually solved with the dependsOn option between a component that installs the operator and another that uses the installed types.

In preview mode, the provider maintains a cache of the CRDs that are planned, so that Pulumi may determine whether a given CRD is namespaced or cluster-scoped. The Release/v3 resource unfortunately doesn't contribute information to said cache, which may lead to the "failed to determine if the following GVK is namespaced" error. We're tracking this limitation in https://github.com/pulumi/pulumi-kubernetes/issues/3299.

EronWright avatar Oct 31 '24 20:10 EronWright

@EronWright so what's the workaround?

trondhindenes avatar Jan 10 '25 08:01 trondhindenes

Seeing this in Pulumi YAML:

  rabbitmq-operator:
    type: command:local:Command
    properties:
      create: kubectl apply -f https://github.com/rabbitmq/cluster-operator/releases/download/v2.12.1/cluster-operator.yml
      delete: kubectl delete -f https://github.com/rabbitmq/cluster-operator/releases/download/v2.12.1/cluster-operator.yml

  rabbitmq-deploy:
    type: kubernetes:yaml/v2:ConfigGroup
    properties:
      yaml: |
        apiVersion: rabbitmq.com/v1beta1
        kind: RabbitmqCluster
        metadata:
          name: rabbitmq
          namespace: rabbitmq
        spec:
          replicas: 3
          override:
            statefulSet:
              spec:
                template:
                  spec:
                    containers: []
                    priorityClassName: high-priority
    options:
      parent: ${rabbitmq-operator}

pulumi up error:

error: kubernetes:yaml/v2:ConfigGroup resource 'rabbitmq-deploy' has a problem: marshaling properties: awaiting input property "resources": failed to determine if the following GVK is namespaced: rabbitmq.com/v1beta1, Kind=RabbitmqCluster

pulumi about:

CLI
Version      3.162.0
Go Version   go1.24.2
Go Compiler  gc

Plugins
KIND      NAME        VERSION
resource  command     unknown
resource  gcp         unknown
resource  kubernetes  unknown
language  yaml        1.16.0

Host
OS       ubuntu
Version  22.04
Arch     x86_64

pulumi plugin ls:

NAME           KIND      VERSION  SIZE    INSTALLED     LAST USED
aws            resource  6.70.1   742 MB  1 month ago   1 hour ago
aws            resource  6.52.0   886 MB  6 months ago  1 hour ago
command        resource  1.0.1    36 MB   6 months ago  1 hour ago
gcp            resource  8.20.0   185 MB  1 month ago   1 hour ago
gcp            resource  8.19.1   184 MB  1 month ago   1 hour ago
gcp            resource  8.6.0    239 MB  5 months ago  1 hour ago
gcp            resource  8.1.0    238 MB  6 months ago  1 hour ago
github         resource  6.3.0    46 MB   6 months ago  1 hour ago
kubernetes     resource  4.22.2   150 MB  1 hour ago    1 hour ago
kubernetes     resource  4.18.1   212 MB  6 months ago  1 hour ago
port           resource  2.2.2    64 MB   1 month ago   1 hour ago
pulumiservice  resource  0.26.0   34 MB   6 months ago  1 hour ago
random         resource  4.16.5   74 MB   6 months ago  1 hour ago
std            resource  1.7.3    23 MB   4 months ago  1 hour ago

TOTAL plugin cache size: 3.1 GB

nstires-ctgx avatar Apr 15 '25 20:04 nstires-ctgx

Came here in search for help, because my error was very similar: kubernetes:helm.sh/v4:Chart resource 'ciliumHelmRelease' has a problem: failed to determine if the following GVK is namespaced: cert-manager.io/v1, Kind=Certificate

In my case it was a race condition between deploying cilium, where hubble gets it's certificate via cert-manager and deploying cert-manager and the necessary CRDs. My workaround is to remove the hubble cert-manager integration for now and think about a better solution later.

thetillhoff avatar Jul 09 '25 21:07 thetillhoff

I would advocate for a limited fix where we hardcode a list of known GVKs, to serve as an alternative mechanism to API Server discovery. This would allow the provider to make progress in a lot of situations that are otherwise tricky to handle.

The specific GKVs would be from cert-manager, istio, cilium, etc.

EronWright avatar Sep 25 '25 21:09 EronWright

@EronWright elsewhere when we get this error we assume the object is namespaced, presumably to allow things to continue gracefully. Could we not do something similar in this case?

https://github.com/pulumi/pulumi-kubernetes/blob/bc3c4569d3ef3997142d0fdd3141cd087df7acd0/provider/pkg/provider/provider.go#L1370-L1376

blampe avatar Sep 26 '25 21:09 blampe

any movement here? IMHO this is a fairly critical bug.

trondhindenes avatar Oct 17 '25 09:10 trondhindenes

I am running into a similar issue with a deployment of the External Secrets Helm chart, and an in-house Helm chart called argocd-support that contains several ExternalSecret (external-secrets.io/v1) instance resources. The latter is set to depend on the former, yet the Pulumi Preview step seems to be ignoring that relationship, and fails[^1] with the below error:

error: kubernetes:helm.sh/v4:Chart resource 'sbox-green-applications-argocd-argocd-support-helm-chart' has a problem: failed to determine if the following GVK is namespaced: external-secrets.io/v1, Kind=ExternalSecret

Both External Secrets and argocd-support are k8s.helm.v4.Chart resources. @EronWright you mentioned an issue with this and Release/v3 resources, would that CRD cache problem exist with Chart/v4 as well?

Here's the code I'm using for creating the External Secrets Helm chart:

    def create_external_secrets_helm_chart(self, depends_on: DependsOn = None):
        return k8s.helm.v4.Chart(
            resource_name=f'{self.name}-helm-chart',
            name='external-secrets',
            opts=pulumi.ResourceOptions(parent=self, depends_on=depends_on),
            repository_opts=k8s.helm.v4.RepositoryOptsArgs(
                repo='https://charts.external-secrets.io'
            ),
            chart='external-secrets',
            version=self.args.external_secrets_chart_version,
            namespace='external-secrets',
            value_yaml_files=[FileAsset(path='helm/external-secrets/values.yaml')],
            values=self.external_secrets_service_account_role.arn.apply(lambda role_arn: {
                'region': 'us-west-2',
                'role': role_arn,
                'serviceAccount': {
                    'annotations': {
                        'eks.amazonaws.com/role-arn': role_arn
                    }
                }
            }),
        )

And here's the code for the other chart (it doesn't contain ArgoCD, it's just supporting resources, such as those ExternalSecrets):

    def create_argocd_support_helm_chart(self, depends_on: DependsOn = None) -> k8s.helm.v4.Chart:
        return k8s.helm.v4.Chart(
            resource_name=f'{self.name}-argocd-support-helm-chart',
            name='argocd-support',
            opts=pulumi.ResourceOptions(parent=self, depends_on=depends_on),
            chart='./helm/argocd-support',
            namespace='argocd',
            value_yaml_files=[FileAsset(path='helm/argocd/values.yaml'), FileAsset(path=f'helm/argocd/values.{self.env.tier}.yaml')]
        )

Note that the depends_on relationship is between these two resources' grandparent ComponentResources. The hierarchy looks like this, with Applications depending on Security:

 Cluster
    |-- Security
           |-- ExternalSecrets
                 |-- external-secrets Helm chart
    |-- Applications
           |-- ArgoCD
                 |-- argocd-support Helm chart

Interestingly, the error goes away if I switch argocd-support to use k8s.helm.v3.Release. So it does seem like this is specifically an issue with Chart/v4, and it may not be the same as the OP's issue. Regardless, I would like to know what is going on here and why using k8s.helm.v4.Chart means that Pulumi ignores the depends_on relationship, causing the Preview step to fail.

[^1]: This only starts to happen when the Preview step "sees" farther into the deployment, i.e. if the initial deployment fails and has to be restarted in the middle, so now Preview will plan more steps. At least that's my interpretation here, it's not completely clear. Once it starts happening it will continue to happen until I disable the argocd-support Helm chart.

js-michaels avatar Oct 20 '25 22:10 js-michaels