pulumi-kubernetes
pulumi-kubernetes copied to clipboard
failed to determine if the following GVK is namespaced if CRD is created in the same run
What happened?
I'm deploying cert-manager helm release + a ClusterIssuer resource
cert-manager helm release creates the ClusterIssuer CRD, but it seems that the pulumi.ConfigGroup that creates the ClusterIssuer resources tries to check some properties of the ClusterIssuer CRD before it's actually created although there is depends_on to the cert-manager.
Exception: marshaling properties: awaiting input property "resources": failed to determine if the following GVK is namespaced: cert-manager.io/v1, Kind=ClusterIssuer
If I rerun pulumi up -y --skip-preview after then the ClusterIssuer will be created fine. That's why I think it's a timing issue between the creation of the ClusterIssuer CRD and the actual ClusterIssuer resource.
Example
cert_manager = helmv3.Release(
"cert_manager",
helmv3.ReleaseArgs(
# https://cert-manager.io/docs/installation/helm/
# https://artifacthub.io/packages/helm/cert-manager/cert-manager
# https://github.com/cert-manager/cert-manager
name="cert-manager",
chart="cert-manager",
namespace=namespace.id,
version="1.15.2",
repository_opts=helmv3.RepositoryOptsArgs(
repo="https://charts.jetstack.io",
),
values=cert_manager_helm_values,
),
opts=pulumi.ResourceOptions(
provider=kubernetes_provider,
),
)
def generate_clusterissuer_manifest(name, server):
def func(args):
template = env.get_template("letsencrypt-clusterissuer.j2.yaml")
rendered = template.render(
domain_name=args["domain_name"],
region=args["region"],
zone_id=args["zone_id"],
name=name,
server=server,
)
return rendered
return pulumi.Output.all(
domain_name=domain_name,
region=region,
zone_id=zone.id,
).apply(func)
# https://www.pulumi.com/registry/packages/kubernetes/api-docs/yaml/configgroup/
letsencrypt_stagin_cluster_issuer_cg = kubernetes.yaml.v2.ConfigGroup(
"letsencrypt-staging",
yaml=generate_clusterissuer_manifest(
name="letsencrypt-staging",
server="https://acme-staging-v02.api.letsencrypt.org/directory",
),
opts=pulumi.ResourceOptions(
depends_on=[
cert_manager,
],
provider=kubernetes_provider,
),
)
Output of pulumi about
pulumi about
CLI
Version 3.129.0
Go Version go1.22.6
Go Compiler gc
Plugins
KIND NAME VERSION
resource aws 6.49.1
resource eks 2.7.8
resource kubernetes 4.17.1
language python unknown
resource random 4.16.3
Host
OS darwin
Version 14.6.1
Arch x86_64
This project is written in python: executable='/Users/xxx/git/pulumi-aws-ecerulm/venv/bin/python' version='3.12.5'
...
Backend
Name xxxxx
URL file://~
User xxxx
Organizations
Token type personal
Dependencies:
NAME VERSION
Jinja2 3.1.4
pip 24.2
pulumi_eks 2.7.8
pulumi_random 4.16.3
setuptools 72.2.0
wheel 0.44.0
Additional context
No response
Contributing
Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
The workaround I use now is to create the CRDs myself with another kubernetes.yaml.v2.ConfigFile with just the CRDs
# https://www.pulumi.com/registry/packages/kubernetes/api-docs/yaml/configfile/
crds = kubernetes.yaml.v2.ConfigFile(
"letsencrypt-prod",
file="files/cert-manager.crds.yaml",
opts=pulumi.ResourceOptions(
depends_on=[
],
provider=kubernetes_provider,
),
)
then I set installCRDs: false for the cert-manager helm values, and make the ClusterIssuer depend on the crds ConfigFile and cert-manager's Release
I faced a similar issue with multiple Release, e.g Kyverno or Karpenter and after quite a lot of testing, here is my conclusion:
kubernetes.yaml.v2.ConfigFile does not seem to work when kubernetes.yaml.ConfigFile does.
A little test I run, trying to create the same resource but with a different provider (made sure there was no clash in the manifest):
Updating (dev):
Type Name Status Info
pulumi:pulumi:Stack pulumi-aetion-dev **failed** 1 error
└─ pulumi-project:k8s:core-components core-components
+ ├─ kubernetes:helm.sh/v3:Release helm-kyverno created (67s)
+ ├─ kubernetes:yaml:ConfigFile kyverno-sync-secret-crd created
+ │ └─ kubernetes:kyverno.io/v1:ClusterPolicy sync-secrets created (3s)
+ └─ kubernetes:yaml/v2:ConfigFile kyverno-sync-secret-crd2 created
Diagnostics:
pulumi:pulumi:Stack (pulumi-aetion-dev):
error: kubernetes:yaml/v2:ConfigFile resource 'kyverno-sync-secret-crd2' has a problem: marshaling properties: awaiting input property "resources": failed to determine if the following GVK is namespaced: kyverno.io/v1, Kind=ClusterPolicy
No amount of waiting or anything like that would help, as a matter of fact CRDs are created very early with Helm, way before the Pods are rolled out and the ConfigFile provider try to apply a manifest (it's easy to check with kubectl while the program is running).
My suspicion lies here but I was not able to test it. The fact is, It always work on a second run.
Anyway, I'm not sure what's the difference between the provider v1 and v2 but I'll stick with v1 for now as it seems to be working just fine for what I'm doing.
The basic requirement is that the CRD be definitely installed before any CRs that depends on it are registered. This requirement is usually solved with the dependsOn option between a component that installs the operator and another that uses the installed types.
In preview mode, the provider maintains a cache of the CRDs that are planned, so that Pulumi may determine whether a given CRD is namespaced or cluster-scoped. The Release/v3 resource unfortunately doesn't contribute information to said cache, which may lead to the "failed to determine if the following GVK is namespaced" error. We're tracking this limitation in https://github.com/pulumi/pulumi-kubernetes/issues/3299.
@EronWright so what's the workaround?
Seeing this in Pulumi YAML:
rabbitmq-operator:
type: command:local:Command
properties:
create: kubectl apply -f https://github.com/rabbitmq/cluster-operator/releases/download/v2.12.1/cluster-operator.yml
delete: kubectl delete -f https://github.com/rabbitmq/cluster-operator/releases/download/v2.12.1/cluster-operator.yml
rabbitmq-deploy:
type: kubernetes:yaml/v2:ConfigGroup
properties:
yaml: |
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: rabbitmq
namespace: rabbitmq
spec:
replicas: 3
override:
statefulSet:
spec:
template:
spec:
containers: []
priorityClassName: high-priority
options:
parent: ${rabbitmq-operator}
pulumi up error:
error: kubernetes:yaml/v2:ConfigGroup resource 'rabbitmq-deploy' has a problem: marshaling properties: awaiting input property "resources": failed to determine if the following GVK is namespaced: rabbitmq.com/v1beta1, Kind=RabbitmqCluster
pulumi about:
CLI
Version 3.162.0
Go Version go1.24.2
Go Compiler gc
Plugins
KIND NAME VERSION
resource command unknown
resource gcp unknown
resource kubernetes unknown
language yaml 1.16.0
Host
OS ubuntu
Version 22.04
Arch x86_64
pulumi plugin ls:
NAME KIND VERSION SIZE INSTALLED LAST USED
aws resource 6.70.1 742 MB 1 month ago 1 hour ago
aws resource 6.52.0 886 MB 6 months ago 1 hour ago
command resource 1.0.1 36 MB 6 months ago 1 hour ago
gcp resource 8.20.0 185 MB 1 month ago 1 hour ago
gcp resource 8.19.1 184 MB 1 month ago 1 hour ago
gcp resource 8.6.0 239 MB 5 months ago 1 hour ago
gcp resource 8.1.0 238 MB 6 months ago 1 hour ago
github resource 6.3.0 46 MB 6 months ago 1 hour ago
kubernetes resource 4.22.2 150 MB 1 hour ago 1 hour ago
kubernetes resource 4.18.1 212 MB 6 months ago 1 hour ago
port resource 2.2.2 64 MB 1 month ago 1 hour ago
pulumiservice resource 0.26.0 34 MB 6 months ago 1 hour ago
random resource 4.16.5 74 MB 6 months ago 1 hour ago
std resource 1.7.3 23 MB 4 months ago 1 hour ago
TOTAL plugin cache size: 3.1 GB
Came here in search for help, because my error was very similar:
kubernetes:helm.sh/v4:Chart resource 'ciliumHelmRelease' has a problem: failed to determine if the following GVK is namespaced: cert-manager.io/v1, Kind=Certificate
In my case it was a race condition between deploying cilium, where hubble gets it's certificate via cert-manager and deploying cert-manager and the necessary CRDs. My workaround is to remove the hubble cert-manager integration for now and think about a better solution later.
I would advocate for a limited fix where we hardcode a list of known GVKs, to serve as an alternative mechanism to API Server discovery. This would allow the provider to make progress in a lot of situations that are otherwise tricky to handle.
The specific GKVs would be from cert-manager, istio, cilium, etc.
@EronWright elsewhere when we get this error we assume the object is namespaced, presumably to allow things to continue gracefully. Could we not do something similar in this case?
https://github.com/pulumi/pulumi-kubernetes/blob/bc3c4569d3ef3997142d0fdd3141cd087df7acd0/provider/pkg/provider/provider.go#L1370-L1376
any movement here? IMHO this is a fairly critical bug.
I am running into a similar issue with a deployment of the External Secrets Helm chart, and an in-house Helm chart called argocd-support that contains several ExternalSecret (external-secrets.io/v1) instance resources. The latter is set to depend on the former, yet the Pulumi Preview step seems to be ignoring that relationship, and fails[^1] with the below error:
error: kubernetes:helm.sh/v4:Chart resource 'sbox-green-applications-argocd-argocd-support-helm-chart' has a problem: failed to determine if the following GVK is namespaced: external-secrets.io/v1, Kind=ExternalSecret
Both External Secrets and argocd-support are k8s.helm.v4.Chart resources. @EronWright you mentioned an issue with this and Release/v3 resources, would that CRD cache problem exist with Chart/v4 as well?
Here's the code I'm using for creating the External Secrets Helm chart:
def create_external_secrets_helm_chart(self, depends_on: DependsOn = None):
return k8s.helm.v4.Chart(
resource_name=f'{self.name}-helm-chart',
name='external-secrets',
opts=pulumi.ResourceOptions(parent=self, depends_on=depends_on),
repository_opts=k8s.helm.v4.RepositoryOptsArgs(
repo='https://charts.external-secrets.io'
),
chart='external-secrets',
version=self.args.external_secrets_chart_version,
namespace='external-secrets',
value_yaml_files=[FileAsset(path='helm/external-secrets/values.yaml')],
values=self.external_secrets_service_account_role.arn.apply(lambda role_arn: {
'region': 'us-west-2',
'role': role_arn,
'serviceAccount': {
'annotations': {
'eks.amazonaws.com/role-arn': role_arn
}
}
}),
)
And here's the code for the other chart (it doesn't contain ArgoCD, it's just supporting resources, such as those ExternalSecrets):
def create_argocd_support_helm_chart(self, depends_on: DependsOn = None) -> k8s.helm.v4.Chart:
return k8s.helm.v4.Chart(
resource_name=f'{self.name}-argocd-support-helm-chart',
name='argocd-support',
opts=pulumi.ResourceOptions(parent=self, depends_on=depends_on),
chart='./helm/argocd-support',
namespace='argocd',
value_yaml_files=[FileAsset(path='helm/argocd/values.yaml'), FileAsset(path=f'helm/argocd/values.{self.env.tier}.yaml')]
)
Note that the depends_on relationship is between these two resources' grandparent ComponentResources. The hierarchy looks like this, with Applications depending on Security:
Cluster
|-- Security
|-- ExternalSecrets
|-- external-secrets Helm chart
|-- Applications
|-- ArgoCD
|-- argocd-support Helm chart
Interestingly, the error goes away if I switch argocd-support to use k8s.helm.v3.Release. So it does seem like this is specifically an issue with Chart/v4, and it may not be the same as the OP's issue. Regardless, I would like to know what is going on here and why using k8s.helm.v4.Chart means that Pulumi ignores the depends_on relationship, causing the Preview step to fail.
[^1]: This only starts to happen when the Preview step "sees" farther into the deployment, i.e. if the initial deployment fails and has to be restarted in the middle, so now Preview will plan more steps. At least that's my interpretation here, it's not completely clear. Once it starts happening it will continue to happen until I disable the argocd-support Helm chart.