charts icon indicating copy to clipboard operation
charts copied to clipboard

Readiness Probe for Kubernetes > 1.20.0

Open critchtionary opened this issue 1 year ago • 3 comments

Is this a request for help?: No


Is this a BUG REPORT or FEATURE REQUEST? (choose one): Feature Request

Version of Helm and Kubernetes: 3.10.1/1.24

Which chart: artifactory

Which product license (Enterprise/Pro/oss): Enterprise

I was wondering why readiness probes are only added to the Artifactory stateful set when the Kubernetes version is <1.20.0? Based on the changelog this was an intentional change.

I'm deploying onto K8s 1.24 and have had to patch in a readiness probe so that I can use minReadySeconds to prevent the pod being sent requests a couple of seconds before it's actually ready to handle them. This is not ideal as I'm having to duplicate the config for the check, which could break in a future version.

If there is a valid reason to have this disabled for >1.20.0, could it be implemented instead by setting the default value of enabled for the readiness probe in values.yaml based on the K8s version, so that people could more easily override it?

critchtionary avatar Mar 23 '23 11:03 critchtionary

@critchtionary Thanks for reporting ! startupProbe is applicable for k8s versions >= 1.20.0 unless you enable featureflags in K8s for lower versions , Hence, readinessProbe.initialDelaySeconds is set to zero for >=1.20.0.

we will get back to you on your use case

chukka avatar Mar 24 '23 09:03 chukka

This should be considered a bug.

We had indeed an outage as follows:

  • the nginx deployment uses the artifactory svc as upstream
  • the nginx pods use http://localhost:8082/router/api/v1/system/health as readiness check, ie the nginx readiness resolves to the upstream artifactory.svc router health ...
  • one of the artifactory replicas' router health checks started to fail due to an issue with its jfconnect service
  • but since the readiness check this artifactory pod was missing as described in this ticket, the unhealthy pod was not taken out of the svc endpoints

This made the HA setup fail with 503s despite the remaining pod being in a perfectly healthy state.

bramaq avatar Apr 24 '23 21:04 bramaq

We have had big issues with this. Readiness probes are basically not present and since artifactory is sometimes stalling and not answering on some pods the service keep sending requests to stalling pods. The only solution is to remove the version requirement. Same experience as @bramaq Would also recommend going for an official fix.

nbgbankdata avatar Nov 06 '23 13:11 nbgbankdata