kic-reference-architectures icon indicating copy to clipboard operation
kic-reference-architectures copied to clipboard

bug: OTEL Operator fails with Microk8s 1.24/stable

Open qdzlug opened this issue 2 years ago • 2 comments

Describe the bug If you deploy on the most recent snap of microk8s (1.24/stable) you receive errors when trying to deploy Observability:

[2022-05-17T14:44:25.860Z]  +  kubernetes:apps/v1:Deployment opentelemetry-operator-system/opentelemetry-operator-controller-manager creating 

[2022-05-17T14:44:25.860Z]  +  kubernetes:apps/v1:Deployment opentelemetry-operator-system/opentelemetry-operator-controller-manager creating [1/2] Waiting for app ReplicaSet be marked available

[2022-05-17T14:44:26.426Z]  +  kubernetes:apps/v1:Deployment opentelemetry-operator-system/opentelemetry-operator-controller-manager creating warning: [MinimumReplicasUnavailable] Deployment does not have minimum availability.

[2022-05-17T14:44:26.426Z]  +  kubernetes:apps/v1:Deployment opentelemetry-operator-system/opentelemetry-operator-controller-manager creating [1/2] Waiting for app ReplicaSet be marked available (0/1 Pods available)

[2022-05-17T14:44:36.387Z]  +  kubernetes:apps/v1:Deployment opentelemetry-operator-system/opentelemetry-operator-controller-manager creating warning: [Pod opentelemetry-operator-system/opentelemetry-operator-controller-manager-56c4b5cbfd-vjhg2]: containers with unready status: [manager]

[2022-05-17T14:44:36.950Z] @ Updating....

[2022-05-17T14:44:36.951Z]  +  kubernetes:apps/v1:Deployment opentelemetry-operator-system/opentelemetry-operator-controller-manager creating Deployment initialization complete

[2022-05-17T14:44:36.951Z]  +  kubernetes:apps/v1:Deployment opentelemetry-operator-system/opentelemetry-operator-controller-manager created Deployment initialization complete

[2022-05-17T14:44:43.496Z]  +  kubernetes:core/v1:Service opentelemetry-operator-system/opentelemetry-operator-controller-manager-metrics-service creating Service initialization complete

[2022-05-17T14:44:43.496Z]  +  kubernetes:core/v1:Service opentelemetry-operator-system/opentelemetry-operator-controller-manager-metrics-service created Service initialization complete

[2022-05-17T14:44:43.496Z]  +  kubernetes:core/v1:Service opentelemetry-operator-system/opentelemetry-operator-webhook-service creating Service initialization complete

[2022-05-17T14:44:43.496Z]  +  kubernetes:core/v1:Service opentelemetry-operator-system/opentelemetry-operator-webhook-service created Service initialization complete

[2022-05-17T14:49:19.646Z] @ Updating.................

[2022-05-17T14:49:19.646Z]  +  kubernetes:core/v1:ServiceAccount opentelemetry-operator-system/opentelemetry-operator-controller-manager creating error: 1 error occurred:

[2022-05-17T14:49:19.646Z]  +  kubernetes:core/v1:ServiceAccount opentelemetry-operator-system/opentelemetry-operator-controller-manager **creating failed** error: 1 error occurred:

[2022-05-17T14:49:19.646Z]  +  pulumi:pulumi:Stack observability-marajenk29 creating error: update failed

[2022-05-17T14:49:19.646Z]  +  kubernetes:yaml:ConfigFile /jenkins/workspace/mara_mk8s_prod/pulumi/python/kubernetes/observability/otel-operator/opentelemetry-operator.yaml created 

[2022-05-17T14:49:19.646Z]  +  kubernetes:yaml:ConfigGroup otel-op created 

[2022-05-17T14:49:19.646Z]  +  pulumi:pulumi:Stack observability-marajenk29 **creating failed** 1 error; 2 messages

[2022-05-17T14:49:19.646Z]  

[2022-05-17T14:49:19.646Z] Diagnostics:

[2022-05-17T14:49:19.646Z]   pulumi:pulumi:Stack (observability-marajenk29):

[2022-05-17T14:49:19.646Z]     Cluster name: microk8s-cluster

[2022-05-17T14:49:19.647Z]     error: update failed

[2022-05-17T14:49:19.647Z]  

[2022-05-17T14:49:19.647Z]     I0517 14:44:21.389343   49967 request.go:665] Waited for 1.023910127s due to client-side throttling, not priority and fairness, request: POST:https://164.92.127.48:16443/apis/rbac.authorization.k8s.io/v1/clusterroles

[2022-05-17T14:49:19.647Z]  

[2022-05-17T14:49:19.647Z]   kubernetes:core/v1:ServiceAccount (opentelemetry-operator-system/opentelemetry-operator-controller-manager):

[2022-05-17T14:49:19.647Z]     error: 1 error occurred:

[2022-05-17T14:49:19.647Z]     	* resource opentelemetry-operator-system/opentelemetry-operator-controller-manager was successfully created, but the Kubernetes API server reported that it failed to fully initialize or become live: Timeout occurred polling for 'opentelemetry-operator-controller-manager'

[2022-05-17T14:49:19.647Z]  

[2022-05-17T14:49:19.647Z] Resources:

[2022-05-17T14:49:19.647Z]     + 22 created

[2022-05-17T14:49:19.647Z] 

To Reproduce Steps to reproduce the behavior:

  1. Install most recent microk8s version (1.24)
  2. Deploy MARA

Expected behavior Works...but doesn't on 1.24. Rolling back to 1.23/stable fixes the issue.

Your environment n/a

Additional context None

qdzlug avatar May 17 '22 21:05 qdzlug

#182 tracks this in our other deploys

qdzlug avatar Aug 19 '22 15:08 qdzlug

For now, we are pinning to 1.23.x

qdzlug avatar Aug 19 '22 15:08 qdzlug