datadog-operator
                                
                                 datadog-operator copied to clipboard
                                
                                    datadog-operator copied to clipboard
                            
                            
                            
                        cannot patch "datadogagents.datadoghq.com" with kind CustomResourceDefinition when updating operator via Helm & Terraform
Output of the info page (if this is a bug)
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  ~ update in-place
Terraform will perform the following actions:
  # module.datadog-operator[0].helm_release.datadog-operator will be updated in-place
  ~ resource "helm_release" "datadog-operator" {
        id                         = "datadog"
        name                       = "datadog"
      + pass_credentials           = false
      ~ version                    = "0.7.10" -> "0.8.6"
        # (26 unchanged attributes hidden)
    }
Plan: 0 to add, 1 to change, 0 to destroy.
module.datadog-operator[0].helm_release.datadog-operator: Modifying... [id=datadog]
module.datadog-operator[0].helm_release.datadog-operator: Still modifying... [id=datadog, 10s elapsed]
╷
│ Error: cannot patch "datadogagents.datadoghq.com" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "datadogagents.datadoghq.com" is invalid: spec.validation: Forbidden: top-level and per-version schemas are mutually exclusive
│
│   with module.datadog-operator[0].helm_release.datadog-operator,
│   on modules/datadog-operator/main.tf line 62, in resource "helm_release" "datadog-operator":
│   62: resource "helm_release" "datadog-operator" {
│
╵
Releasing state lock. This may take a few moments...
ERRO[0131] 1 error occurred:
        * exit status 1
Describe what happened: Trying to update to latest helm chart to get newer agent versions
Describe what you expected: Operator to apply successfully
Steps to reproduce the issue: Upgrade from 0.7.10 -> 0.8.6
Additional environment details (Operating System, Cloud provider, etc): AWS EKS 1.20 Terraform 1.0.8
Agent CRD config (templated with Terraform)
---
apiVersion: datadoghq.com/v1alpha1
kind: DatadogAgent
metadata:
  name: datadog
  namespace: ${namespace}
spec:
  agent:
    config:
      collectEvents: true
      leaderElection: true
      tolerations:
      - operator: Exists
      podLabelsAsTags: {
        "*": "kube_%%label%%"
      }
      tags:
      %{~ for tag in metric_tags ~}
        - ${tag}
      %{~ endfor ~}
    image:
      name: ${agent_image_tag}
    log:
      enabled: ${log_monitoring}
    process:
      enabled: true
      processCollectionEnabled: true
    systemProbe:
      enabled: true
    security:
      compliance:
        enabled: ${security_monitoring}
      runtime:
        enabled: ${security_monitoring}
  clusterAgent:
    enabled: true
    config:
      collectEvents: true
      clusterChecksEnabled: true
      externalMetrics:
        enabled: true
      volumeMounts:
        - mountPath: "/etc/datadog-agent/conf.d/mysql.d/"
          name: mysql-conf
          readOnly: true
        - mountPath: "/etc/datadog-agent/conf.d/postgres.d/"
          name: postgres-conf
          readOnly: true
      volumes:
        - name: mysql-conf
          projected:
            sources:
              - secret:
                  name: mysql-${db}-conf
        - name: postgres-conf
          projected:
            sources:
              - secret:
                  name: pgsql-${db}-conf
    image:
      name: ${cluster_agent_image_tag}
  clusterName: ${cluster_name}
  credentials:
    apiSecret:
      secretName: ${cred_secret_name}
      keyName: api-key
    appSecret:
      secretName: ${cred_secret_name}
      keyName: app-key
  features:
    kubeStateMetricsCore:
      enabled: true
    networkMonitoring:
      enabled: ${network_monitoring}
    orchestratorExplorer:
      enabled: true
      extraTags:
        - "datacenter:${datacenter}"
    prometheusScrape:
      enabled: false
  registry: public.ecr.aws/datadog
Datadog Operator config:
---
apiKeyExistingSecret: ${secret_name}
appKeyExistingSecret: ${secret_name}
datadogMonitor:
  enabled: ${enable_datadog_monitor}
supportExtendedDaemonset: ${enable_extended_daemonset}
registry: public.ecr.aws/datadog
watchNamespaces:
  - ""
I noticed the new CRD for DatadogAgent v2alpha1 but I wouldn't expect it to error at the operator level unless there was an issue with the CRD spec itself.
Update: I was able to update to 0.8.1 successfully. However, any update after I get a new error when deploying the DatadogAgent CRD:
Attribute not found in schema
  with module.datadog-agent[0].kubernetes_manifest.datadog-agent-operator,
  on modules/datadog-agent/main.tf line 18, in resource "kubernetes_manifest" "datadog-agent-operator":
  18: resource "kubernetes_manifest" "datadog-agent-operator" {
Unable to find schema type for attribute:
spec.clusterAgent.config.volumes[0].ephemeral.readOnly
╷
│ Error: Failed to transform Tuple element into Tuple element type
│
│   with module.datadog-agent[0].kubernetes_manifest.datadog-agent-operator,
│   on modules/datadog-agent/main.tf line 18, in resource "kubernetes_manifest" "datadog-agent-operator":
│   18: resource "kubernetes_manifest" "datadog-agent-operator" {
│
│ Error (see above) at attribute:
│ spec.clusterAgent.config.volumes[0]
╵
╷
│ Error: Failed to transform Object element into Object element type
│
│   with module.datadog-agent[0].kubernetes_manifest.datadog-agent-operator,
│   on modules/datadog-agent/main.tf line 18, in resource "kubernetes_manifest" "datadog-agent-operator":
│   18: resource "kubernetes_manifest" "datadog-agent-operator" {
│
│ Error (see above) at attribute:
│ spec.clusterAgent.config.volumes
╵
╷
│ Error: Failed to transform Object element into Object element type
│
│   with module.datadog-agent[0].kubernetes_manifest.datadog-agent-operator,
│   on modules/datadog-agent/main.tf line 18, in resource "kubernetes_manifest" "datadog-agent-operator":
│   18: resource "kubernetes_manifest" "datadog-agent-operator" {
│
│ Error (see above) at attribute:
│ spec.clusterAgent.config
╵
╷
│ Error: Failed to transform Object element into Object element type
│
│   with module.datadog-agent[0].kubernetes_manifest.datadog-agent-operator,
│   on modules/datadog-agent/main.tf line 18, in resource "kubernetes_manifest" "datadog-agent-operator":
│   18: resource "kubernetes_manifest" "datadog-agent-operator" {
│
│ Error (see above) at attribute:
│ spec.clusterAgent
╵
╷
│ Error: Failed to transform Object element into Object element type
│
│   with module.datadog-agent[0].kubernetes_manifest.datadog-agent-operator,
│   on modules/datadog-agent/main.tf line 18, in resource "kubernetes_manifest" "datadog-agent-operator":
│   18: resource "kubernetes_manifest" "datadog-agent-operator" {
│
│ Error (see above) at attribute:
│ spec
Okay, I tried using the new schema to redeploy the agent, but it either isn't ready or I'm just misunderstanding exactly what the upgrade path looks like.
The operator at helm-chart version 0.8.6 does not install the v2alpha1 CRD. Trying to use the v2alpha1 spec but marked as v1alpha1 also results in an error.
[~]$ kubectl api-resources | grep datadog
datadogagents                             dd            datadoghq.com/v1alpha1                 true         DatadogAgent
datadogmetrics                                          datadoghq.com/v1alpha1                 true         DatadogMetric
datadogmonitors                                         datadoghq.com/v1alpha1                 true         DatadogMonitor
It appears the CRD at datadog-operator/bundle/manifests/datadoghq.com_datadogagents.yaml is not updated to the latest CRD from the helm-chart repo though I'm not sure if its the source of truth for what gets bundled.
It is odd though, the original error reported in this issue seems to indicate the new CRD spec trying to be patched. I'm just operating with limited information based on the errors presented to me.
I was able to completely unblock myself by upgrading the operator to 0.8.1 first, deleting the DatadogAgent deployment, and THEN redeploying the v1alpha1 spec after upgrading to 0.8.6.
also curious about this
To clarify testing on another cluster from 7.10. I did some more thorough testing, I wasn't able to upgrade straight to 0.8.6 even after deleting the datadogagents resource. I stepped down each version until I was able to upgrade to 0.8.3 which then allowed me to apply 0.8.6
0.8.4 & 0.8.5 error differs from the straight 0.8.6 error:
│ Error: template: datadog-operator/charts/datadog-crds/templates/datadoghq.com_datadogmonitors_v1beta1.yaml:1:41: executing "datadog-operator/charts/datadog-crds/templates/datadoghq.com_datadogmonitors_v1beta1.yaml" at <semverCompare "<=21" .Capabilities.KubeVersion.Minor>: error calling semverCompare: Invalid Semantic Version
│
│   with module.datadog-operator[0].helm_release.datadog-operator,
│   on modules/datadog-operator/main.tf line 62, in resource "helm_release" "datadog-operator":
│   62: resource "helm_release" "datadog-operator" {
│
@GenPage So when i tried to upgrade from 0.8.0 to 0.8.6 i got this error
Helm install failed: template: datadog-operator/charts/datadog-crds/templates/datadoghq.com_datadogmonitors_v1beta1.yaml:1:41: executing "datadog-operator/charts/datadog-crds/templates/datadoghq.com_datadogmonitors_v1beta1.yaml" at <semverCompare "<21" .Capabilities.KubeVersion.Minor>: error calling semverCompare: Invalid Semantic Version
So if i understand upgrading from 0.8.0 to 0.8.3 should be fine and once thats done i can upgrade to 0.8.6
Is that the upgrade path i should try..Also do i still need to delete anything seperately?
Yes, that's how I was able to get Helm to apply successfully. I did not delete anything.
Hello, sorry for not getting to the issue on time. Operator v0.x is no longer supported and we recommend migrating to most recent version of v1.x.
Please open a new issue if there is anything blocking migration or you experience same issue in v1.x.