opentelemetry-operator icon indicating copy to clipboard operation
opentelemetry-operator copied to clipboard

Node.js pod CrashLoopBackOff after auto-instrumenting

Open Starefossen opened this issue 1 year ago • 0 comments

Component(s)

No response

What happened?

Description

Node.js application enters CrashLoop backoff when auto-instrumentation is enabled:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    instrumentation.opentelemetry.io/container-names: unleash
    instrumentation.opentelemetry.io/inject-nodejs: my-system/management-features
  creationTimestamp: "2024-02-21T19:48:21Z"
  generateName: my-demo-b64c6d87b-
  labels:
    app.kubernetes.io/created-by: controller-manager
    app.kubernetes.io/instance: my-demo
    app.kubernetes.io/name: Unleash
    app.kubernetes.io/part-of: unleasherator
    pod-template-hash: b64c6d87b
  name: my-demo-b64c6d87b-w72hh
  namespace: bifrost-unleash
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: my-demo-b64c6d87b
    uid: bd441ad4-6ada-4e92-b2fd-ec17a9ae9a44
  resourceVersion: "520000502"
  uid: 4a93c92e-3df2-4fc3-a611-1bdd4abe1ee6
spec:
  containers:
  - env:
    - name: INIT_ADMIN_API_TOKENS
      valueFrom:
        secretKeyRef:
          key: token
          name: unleasherator-my-demo-admin-key
    - name: DATABASE_PASS
      valueFrom:
        secretKeyRef:
          key: POSTGRES_PASSWORD
          name: my-demo
    - name: DATABASE_USER
      valueFrom:
        secretKeyRef:
          key: POSTGRES_USER
          name: my-demo
    - name: DATABASE_NAME
      valueFrom:
        secretKeyRef:
          key: POSTGRES_DB
          name: my-demo
    - name: DATABASE_HOST
      value: localhost
    - name: DATABASE_PORT
      value: "5432"
    - name: DATABASE_SSL
      value: "false"
    - name: DATABASE_URL
      value: postgres://$(DATABASE_USER):$(DATABASE_PASS)@$(DATABASE_HOST):$(DATABASE_PORT)/$(DATABASE_NAME)
    - name: GOOGLE_IAP_AUDIENCE
      value: /projects/898056957967/global/backendServices/6771496285844745965
    - name: TEAMS_API_URL
      value: http://teams-backend.my-system.svc/query
    - name: TEAMS_API_TOKEN
      valueFrom:
        secretKeyRef:
          key: token
          name: teams-api-token
    - name: TEAMS_ALLOWED_TEAMS
      value: aura,frontendplattform
    - name: LOG_LEVEL
      value: warn
    - name: DATABASE_POOL_MAX
      value: "3"
    - name: DATABASE_POOL_IDLE_TIMEOUT_MS
      value: "1000"
    - name: NODE_OPTIONS
      value: ' --require /otel-auto-instrumentation-nodejs/autoinstrumentation.js'
    - name: OTEL_SERVICE_NAME
      value: my-demo
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
      value: http://opentelemetry-management-collector.my-system:4317
    - name: OTEL_RESOURCE_ATTRIBUTES_POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: OTEL_RESOURCE_ATTRIBUTES_NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: OTEL_PROPAGATORS
      value: tracecontext,baggage,b3
    - name: OTEL_RESOURCE_ATTRIBUTES
      value: k8s.container.name=unleash,k8s.deployment.name=my-demo,k8s.namespace.name=bifrost-unleash,k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME),k8s.replicaset.name=my-demo-b64c6d87b,service.version=v5.8.2-20240130-115753-fd5cd41
    image: europe-north1-docker.pkg.dev/my-io/my/images/unleash-v4:v5.8.2-20240130-115753-fd5cd41
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /health
        port: 4242
        scheme: HTTP
      initialDelaySeconds: 5
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 10
    name: unleash
    ports:
    - containerPort: 4242
      name: http
      protocol: TCP
    resources:
      limits:
        memory: 256Mi
      requests:
        cpu: 100m
        memory: 128Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1001
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-xfgkk
      readOnly: true
    - mountPath: /otel-auto-instrumentation-nodejs
      name: opentelemetry-auto-instrumentation-nodejs
  - args:
    - --structured-logs
    - --port=5432
    - my-management-233d:europe-north1:bifrost-3de70742
    image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.1.0
    imagePullPolicy: IfNotPresent
    name: sql-proxy
    resources:
      limits:
        memory: 100Mi
      requests:
        cpu: 10m
        memory: 100Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      privileged: false
      runAsNonRoot: true
      runAsUser: 65532
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-xfgkk
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - command:
    - cp
    - -a
    - /autoinstrumentation/.
    - /otel-auto-instrumentation-nodejs
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.46.0
    imagePullPolicy: IfNotPresent
    name: opentelemetry-auto-instrumentation-nodejs
    resources:
      limits:
        cpu: 500m
        memory: 128Mi
      requests:
        cpu: 50m
        memory: 128Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1001
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /otel-auto-instrumentation-nodejs
      name: opentelemetry-auto-instrumentation-nodejs
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-xfgkk
      readOnly: true
  nodeName: gke-my-management--nap-e2-standard--7ff15a1a-9bqj
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: gke.io/optimize-utilization-scheduler
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: bifrost-unleash-sql-user
  serviceAccountName: bifrost-unleash-sql-user
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-xfgkk
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
  - emptyDir:
      sizeLimit: 200Mi
    name: opentelemetry-auto-instrumentation-nodejs

Steps to Reproduce

Enable auto-instrumentation of a nodejs application like the one above.

Expected Result

It should not crash.

Actual Result

It fails to start with the following error:

cp: can't preserve ownership of '...': Operation not permitted

Kubernetes Version

v1.28.3

Operator version

0.93.0

Collector version

latest

Environment information

Environment

Cloud: GKE

Log output

cp: can't preserve ownership of '/otel-auto-instrumentation-nodejs/./autoinstrumentation.js': Operation not permitted
cp: can't preserve ownership of '/otel-auto-instrumentation-nodejs/./autoinstrumentation.d.ts.map': Operation not permitted
...

Additional context

No response

Starefossen avatar Feb 21 '24 20:02 Starefossen