ingress-gce icon indicating copy to clipboard operation
ingress-gce copied to clipboard

Ingress fails to infer health check parameters from readiness check

Open rnett opened this issue 3 years ago • 1 comments

From the docs here, I understand the ingress is supposed to infer health check parameters from the pods readiness checks. This fails to happen for this setup:

Ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    ingress.kubernetes.io/backends: '{"k8s-be-32106--f220b85bdab142a3":"HEALTHY","k8s1-f220b85b-gradle-enterprise-gradle-proxy-80-03db6b55":"UNHEALTHY"}'
    ingress.kubernetes.io/forwarding-rule: k8s2-fr-38rm41cl-gradle-enterpri-gradle-enterprise-ing-t31pria0
    ingress.kubernetes.io/https-forwarding-rule: k8s2-fs-38rm41cl-gradle-enterpri-gradle-enterprise-ing-t31pria0
    ingress.kubernetes.io/https-target-proxy: k8s2-ts-38rm41cl-gradle-enterpri-gradle-enterprise-ing-t31pria0
    ingress.kubernetes.io/ssl-cert: k8s2-cr-38rm41cl-1oboxxmuitbrur1q-83ef7b8318992d04
    ingress.kubernetes.io/target-proxy: k8s2-tp-38rm41cl-gradle-enterpri-gradle-enterprise-ing-t31pria0
    ingress.kubernetes.io/url-map: k8s2-um-38rm41cl-gradle-enterpri-gradle-enterprise-ing-t31pria0
    kubernetes.io/ingress.global-static-ip-name: ge-static-ip
    meta.helm.sh/release-name: ge
    meta.helm.sh/release-namespace: gradle-enterprise
  creationTimestamp: "2022-07-28T22:12:26Z"
  finalizers:
  - networking.gke.io/ingress-finalizer-V2
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: gradle-enterprise
    app.kubernetes.io/version: 2022.2.7
  name: gradle-enterprise-ingress
  namespace: gradle-enterprise
  resourceVersion: "59387"
  uid: 3c51a910-12ed-40c4-9468-5490b3fc3d72
spec:
  rules:
  - host: gradle-enterprise-226ee2a0.nip.io
    http:
      paths:
      - backend:
          service:
            name: gradle-proxy
            port:
              number: 80
        path: /
        pathType: Prefix
  tls:
  - hosts:
    - gradle-enterprise-226ee2a0.nip.io
    secretName: gradle-ingress-ssl-secret
status:
  loadBalancer:
    ingress:
    - ip: 34.110.226.160

Service:

apiVersion: v1
kind: Service
metadata:
  annotations:
    cloud.google.com/neg: '{"ingress":true}'
    cloud.google.com/neg-status: '{"network_endpoint_groups":{"80":"k8s1-f220b85b-gradle-enterprise-gradle-proxy-80-03db6b55"},"zones":["us-west1-a","us-west1-c"]}'
    meta.helm.sh/release-name: ge
    meta.helm.sh/release-namespace: gradle-enterprise
  creationTimestamp: "2022-07-28T22:12:24Z"
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: gradle-enterprise
    app.kubernetes.io/version: 2022.2.7
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:cloud.google.com/neg-status: {}
    manager: glbc
    operation: Update
    subresource: status
    time: "2022-07-28T22:12:26Z"
  name: gradle-proxy
  namespace: gradle-enterprise
  resourceVersion: "55512"
  uid: 0f515b48-6a38-45f2-96ad-9f62dfd8f0bf
spec:
  clusterIP: 10.91.129.152
  clusterIPs:
  - 10.91.129.152
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: health
    port: 777
    protocol: TCP
    targetPort: 7777
  - name: http
    port: 80
    protocol: TCP
    targetPort: 9080
  - name: https
    port: 443
    protocol: TCP
    targetPort: 9443
  selector:
    app: gradle-enterprise
    component: proxy
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

Pod:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    checksum.common/env-vars: 886b0459918881a653ed709877ffa1b13a1c55643877f3afa7a2b4614986cae2
    checksum.common/image-pull-secret: abcc3e680b8348953e2480b1b7a839908c27620630c4631788ef29e30e864a6f
    checksum.common/license: 47d1e5ef3c00e167e8946d396e99b47af8d20e5a0f4e5141355511127c711919
    checksum.common/unattended-configuration: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
    checksum.proxy/config: cb0edd24efcfb184f23c0b78313a919ef790ebdbbf1c96e73952debcad101e90
    checksum.proxy/ssl-secret: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2022-07-28T22:12:28Z"
  generateName: gradle-proxy-
  labels:
    app: gradle-enterprise
    app.kubernetes.io/component: proxy
    app.kubernetes.io/part-of: gradle-enterprise
    component: proxy
    controller-revision-hash: gradle-proxy-6c9db65f4d
    statefulset.kubernetes.io/pod-name: gradle-proxy-0
  name: gradle-proxy-0
  namespace: gradle-enterprise
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: gradle-proxy
    uid: b6ff1d2d-ca00-4dcf-a983-9c99c524af8a
  resourceVersion: "55803"
  uid: 6caa006c-4559-4c7c-88a3-6a7a711bfdd4
spec:
  containers:
  - env:
    - name: SSL_ENABLED
      valueFrom:
        configMapKeyRef:
          key: enable.ssl
          name: gradle-env-vars-config
    - name: GE_READINESS__ENTERPRISE_APP
      value: http://gradle-enterprise-app:8086/info/version
    - name: GE_READINESS__BUILD_CACHE_NODE
      value: http://gradle-build-cache-node:8087/cache-node-info/version
    - name: GE_READINESS__KEYCLOAK
      value: http://gradle-keycloak:8083/keycloak/realms/gradle-enterprise
    - name: GE_READINESS__TEST_DISTRIBUTION
      value: http://gradle-test-distribution-broker:8084/distribution-broker-info/version
    image: registry.gradle.com/gradle-enterprise/gradle-proxy-image:2022.2.7
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
        - /bin/sh
        - -c
        - exec supervisorctl status nginx | grep RUNNING
      failureThreshold: 6
      initialDelaySeconds: 5
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    name: gradle-proxy
    ports:
    - containerPort: 7777
      name: health
      protocol: TCP
    - containerPort: 9080
      name: http
      protocol: TCP
    - containerPort: 9443
      name: https
      protocol: TCP
    readinessProbe:
      failureThreshold: 6
      httpGet:
        path: /health
        port: health
        scheme: HTTP
      initialDelaySeconds: 5
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    resources:
      limits:
        cpu: 250m
        ephemeral-storage: 1Gi
        memory: 512Mi
      requests:
        cpu: 250m
        ephemeral-storage: 1Gi
        memory: 512Mi
    securityContext:
      capabilities:
        drop:
        - NET_RAW
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /opt/certs
      name: ssl-certs
    - mountPath: /etc/nginx/conf.d
      name: config
    - mountPath: /opt/gradle/data/logs
      name: logs
      subPath: proxy
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-dpq7b
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: gradle-proxy-0
  imagePullSecrets:
  - name: gradle-enterprise-image-pull-secret
  nodeName: gk3-gradle-enterprise-default-pool-0de741e1-4vkx
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  readinessGates:
  - conditionType: cloud.google.com/load-balancer-neg-ready
  restartPolicy: Always
  schedulerName: gke.io/optimize-utilization-scheduler
  securityContext:
    fsGroup: 0
    runAsGroup: 0
    runAsUser: 999
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: default
  serviceAccountName: default
  subdomain: gradle-proxy
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: logs
    persistentVolumeClaim:
      claimName: logs-gradle-proxy-0
  - name: ssl-certs
    secret:
      defaultMode: 420
      optional: true
      secretName: gradle-proxy-ssl-secret
  - configMap:
      defaultMode: 420
      name: gradle-proxy-config
    name: config
  - name: kube-api-access-dpq7b
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: null
    message: 'Pod is in NEG "Key{\"k8s1-f220b85b-gradle-enterprise-gradle-proxy-80-03db6b55\",
      zone: \"us-west1-c\"}". NEG is not attached to any BackendService with health
      checking. Marking condition "cloud.google.com/load-balancer-neg-ready" to True.'
    reason: LoadBalancerNegWithoutHealthCheck
    status: "True"
    type: cloud.google.com/load-balancer-neg-ready
  - lastProbeTime: null
    lastTransitionTime: "2022-07-28T22:12:28Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-07-28T22:12:48Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-07-28T22:12:48Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-07-28T22:12:28Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://f7764b21637224afc83771a7779c54b7d31b9d2b0c4bc709ee2b3c637b6a8a51
    image: registry.gradle.com/gradle-enterprise/gradle-proxy-image:2022.2.7
    imageID: registry.gradle.com/gradle-enterprise/gradle-proxy-image@sha256:758fb79672d1fb1cd46f06e9bc0b763fe4808c3c6f8b2497d8d126cb227a55e9
    lastState: {}
    name: gradle-proxy
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2022-07-28T22:12:37Z"
  hostIP: 10.138.0.35
  phase: Running
  podIP: 10.91.0.10
  podIPs:
  - ip: 10.91.0.10
  qosClass: Guaranteed
  startTime: "2022-07-28T22:12:28Z"

This results in a health check like this: image

When afaik it should be hitting /health on port 777 (or 7777 if it goes directly to the pod).

May be related to the pod being in a LoadBalancerNegWithoutHealthCheck state, but since there is a health check and I couldn't find any documentation on that status I'm not sure.

rnett avatar Jul 28 '22 23:07 rnett

/assign @swetharepakula

/kind support

swetharepakula avatar Sep 12 '22 15:09 swetharepakula

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 11 '22 15:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 10 '23 16:01 k8s-triage-robot

Could this be the solution for health checks? https://kubernetes.io/blog/2022/05/13/grpc-probes-now-in-beta/#trying-the-feature-out

chamini2 avatar Jan 10 '23 21:01 chamini2

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Feb 09 '23 21:02 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 09 '23 21:02 k8s-ci-robot