kuberhealthy icon indicating copy to clipboard operation
kuberhealthy copied to clipboard

deployment-check failing intermittently - "Got a 401 : While connecting to deployment-svc"

Open dushyy opened this issue 2 years ago • 2 comments

Hello,

deployment-checks are frequently failing on my cluster (atleast 3/4 times a day). From the logs new deployments are being created fine and only fails when connecting to deployment-svc.

Kuberhealthy is deployed using helm chart

$ helm list -nkuberhealthy
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
kuberhealthy    kuberhealthy    4               2022-12-14 18:40:27.3898512 +0530 IST   deployed        kuberhealthy-87 v2.7.1

I checked the diff between chart version 87 and latest (90) and cant find any changes that would be relevant.

Any suggestions how to debug this further ? Logs below.

TIA

Logs for failed deployments:

$ k logs deployment-1671194195 -nkuberhealthy
time="2022-12-16T12:36:43Z" level=info msg="Found instance namespace: kuberhealthy"
time="2022-12-16T12:36:43Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
time="2022-12-16T12:36:43Z" level=info msg="Debug logging enabled."
time="2022-12-16T12:36:43Z" level=debug msg="[/app/deployment-check]"
time="2022-12-16T12:36:43Z" level=info msg="Parsed CHECK_IMAGE: local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.8"
time="2022-12-16T12:36:43Z" level=info msg="Parsed CHECK_IMAGE_ROLL_TO: local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.9"
time="2022-12-16T12:36:43Z" level=info msg="Found pod namespace: kuberhealthy"
time="2022-12-16T12:36:43Z" level=info msg="Performing check in kuberhealthy namespace."
time="2022-12-16T12:36:43Z" level=info msg="Parsed CHECK_DEPLOYMENT_REPLICAS: 2"
time="2022-12-16T12:36:43Z" level=info msg="Parsed CHECK_SERVICE_ACCOUNT: default"
time="2022-12-16T12:36:43Z" level=info msg="Check time limit set to: 14m46.760673918s"
time="2022-12-16T12:36:43Z" level=info msg="Parsed CHECK_DEPLOYMENT_ROLLING_UPDATE: true"
time="2022-12-16T12:36:43Z" level=info msg="Check deployment image will be rolled from [local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.8] to [local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.9]"
time="2022-12-16T12:36:43Z" level=debug msg="Allowing this check 14m46.760673918s to finish."
time="2022-12-16T12:36:43Z" level=info msg="Kubernetes client created."
time="2022-12-16T12:36:43Z" level=info msg="Waiting for node to become ready before starting check."
time="2022-12-16T12:36:43Z" level=debug msg="Checking if the kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2022-12-16T12:36:43Z" level=debug msg="http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2022-12-16T12:36:43Z" level=debug msg="Kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready. Proceeding to run check."
time="2022-12-16T12:36:43Z" level=info msg="Starting check."
time="2022-12-16T12:36:43Z" level=info msg="Wiping all found orphaned resources belonging to this check."
time="2022-12-16T12:36:43Z" level=info msg="Attempting to find previously created service(s) belonging to this check."
time="2022-12-16T12:36:43Z" level=debug msg="Found 1 service(s)."
time="2022-12-16T12:36:43Z" level=debug msg="Service: kuberhealthy"
time="2022-12-16T12:36:43Z" level=info msg="Did not find any old service(s) belonging to this check."
time="2022-12-16T12:36:43Z" level=info msg="Attempting to find previously created deployment(s) belonging to this check."
time="2022-12-16T12:36:44Z" level=debug msg="Found 1 deployment(s)"
time="2022-12-16T12:36:44Z" level=debug msg=kuberhealthy
time="2022-12-16T12:36:44Z" level=info msg="Did not find any old deployment(s) belonging to this check."
time="2022-12-16T12:36:44Z" level=info msg="Successfully cleaned up prior check resources."
time="2022-12-16T12:36:44Z" level=info msg="Creating deployment resource with 2 replica(s) in kuberhealthy namespace using image [local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.8] with environment variables: map[]"
time="2022-12-16T12:36:44Z" level=info msg="Creating container using image [local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.8] with environment variables: map[]"
time="2022-12-16T12:36:44Z" level=info msg="Created deployment resource."
time="2022-12-16T12:36:44Z" level=info msg="Creating deployment in cluster with name: deployment-deployment"
time="2022-12-16T12:36:44Z" level=info msg="Watching for deployment to exist."
time="2022-12-16T12:36:44Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event ADDED"
time="2022-12-16T12:36:47Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event MODIFIED"
time="2022-12-16T12:36:48Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event MODIFIED"
time="2022-12-16T12:36:53Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event MODIFIED"
time="2022-12-16T12:36:53Z" level=info msg="Deployment is reporting Available with True."
time="2022-12-16T12:36:53Z" level=info msg="Created deployment in kuberhealthy namespace: deployment-deployment"
time="2022-12-16T12:36:53Z" level=info msg="Creating service resource for kuberhealthy namespace."
time="2022-12-16T12:36:53Z" level=info msg="Created service resource."
time="2022-12-16T12:36:53Z" level=info msg="Creating service in cluster with name: deployment-svc"
time="2022-12-16T12:36:53Z" level=info msg="Watching for service to exist."
time="2022-12-16T12:36:53Z" level=debug msg="Received an event watching for service changes: ADDED"
time="2022-12-16T12:36:53Z" level=info msg="Cluster IP found: 10.x.x.xx"
time="2022-12-16T12:36:53Z" level=info msg="Created service in kuberhealthy namespace: deployment-svc"
time="2022-12-16T12:36:53Z" level=debug msg="Retrieving a cluster IP belonging to: deployment-svc"
time="2022-12-16T12:36:53Z" level=info msg="Found service cluster IP address: 10.x.x.xx"
time="2022-12-16T12:36:53Z" level=info msg="Looking for a response from the endpoint."
time="2022-12-16T12:36:53Z" level=debug msg="Setting timeout for backoff loop to: 3m0s"
time="2022-12-16T12:36:53Z" level=info msg="Beginning backoff loop for HTTP GET request."
time="2022-12-16T12:36:53Z" level=debug msg="Making GET to http://10.x.x.xx"
time="2022-12-16T12:36:53Z" level=debug msg="Got a 401"
time="2022-12-16T12:36:53Z" level=info msg="Retrying in 5 seconds."
time="2022-12-16T12:36:58Z" level=error msg="error occurred making request to service in cluster: could not get a response from the given address: http://10.x.x.xx"
time="2022-12-16T12:36:58Z" level=info msg="Cleaning up deployment and service."
time="2022-12-16T12:36:58Z" level=info msg="Attempting to delete service deployment-svc in kuberhealthy namespace."
time="2022-12-16T12:36:58Z" level=debug msg="Checking if service has been deleted."
time="2022-12-16T12:36:58Z" level=debug msg="Delete service and wait has not yet timed out."
time="2022-12-16T12:36:58Z" level=debug msg="Waiting 5 seconds before trying again."
time="2022-12-16T12:37:03Z" level=info msg="Attempting to delete deployment in kuberhealthy namespace."
time="2022-12-16T12:37:03Z" level=debug msg="Checking if deployment has been deleted."
time="2022-12-16T12:37:03Z" level=debug msg="Delete deployment and wait has not yet timed out."
time="2022-12-16T12:37:03Z" level=debug msg="Waiting 5 seconds before trying again."
time="2022-12-16T12:37:08Z" level=info msg="Finished clean up process."
time="2022-12-16T12:37:08Z" level=error msg="Reporting errors to Kuberhealthy: [could not get a response from the given address: http://10.x.x.xx]"

Log for successful deployments:

$ k logs deployment-1671199595 -nkuberhealthy
time="2022-12-16T14:06:43Z" level=info msg="Found instance namespace: kuberhealthy"
time="2022-12-16T14:06:43Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
time="2022-12-16T14:06:43Z" level=info msg="Debug logging enabled."
time="2022-12-16T14:06:43Z" level=debug msg="[/app/deployment-check]"
time="2022-12-16T14:06:43Z" level=info msg="Parsed CHECK_IMAGE: local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.8"
time="2022-12-16T14:06:43Z" level=info msg="Parsed CHECK_IMAGE_ROLL_TO: local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.9"
time="2022-12-16T14:06:43Z" level=info msg="Found pod namespace: kuberhealthy"
time="2022-12-16T14:06:43Z" level=info msg="Performing check in kuberhealthy namespace."
time="2022-12-16T14:06:43Z" level=info msg="Parsed CHECK_DEPLOYMENT_REPLICAS: 2"
time="2022-12-16T14:06:43Z" level=info msg="Parsed CHECK_SERVICE_ACCOUNT: default"
time="2022-12-16T14:06:43Z" level=info msg="Check time limit set to: 14m46.65691144s"
time="2022-12-16T14:06:43Z" level=info msg="Parsed CHECK_DEPLOYMENT_ROLLING_UPDATE: true"
time="2022-12-16T14:06:43Z" level=info msg="Check deployment image will be rolled from [local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.8] to [local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.9]"
time="2022-12-16T14:06:43Z" level=debug msg="Allowing this check 14m46.65691144s to finish."
time="2022-12-16T14:06:43Z" level=info msg="Kubernetes client created."
time="2022-12-16T14:06:43Z" level=info msg="Waiting for node to become ready before starting check."
time="2022-12-16T14:06:43Z" level=debug msg="Checking if the kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2022-12-16T14:06:43Z" level=debug msg="http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2022-12-16T14:06:43Z" level=debug msg="Kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready. Proceeding to run check."
time="2022-12-16T14:06:43Z" level=info msg="Starting check."
time="2022-12-16T14:06:43Z" level=info msg="Wiping all found orphaned resources belonging to this check."
time="2022-12-16T14:06:43Z" level=info msg="Attempting to find previously created service(s) belonging to this check."
time="2022-12-16T14:06:43Z" level=debug msg="Found 1 service(s)."
time="2022-12-16T14:06:43Z" level=debug msg="Service: kuberhealthy"
time="2022-12-16T14:06:43Z" level=info msg="Did not find any old service(s) belonging to this check."
time="2022-12-16T14:06:43Z" level=info msg="Attempting to find previously created deployment(s) belonging to this check."
time="2022-12-16T14:06:44Z" level=debug msg="Found 1 deployment(s)"
time="2022-12-16T14:06:44Z" level=debug msg=kuberhealthy
time="2022-12-16T14:06:44Z" level=info msg="Did not find any old deployment(s) belonging to this check."
time="2022-12-16T14:06:44Z" level=info msg="Successfully cleaned up prior check resources."
time="2022-12-16T14:06:44Z" level=info msg="Creating deployment resource with 2 replica(s) in kuberhealthy namespace using image [local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.8] with environment variables: map[]"
time="2022-12-16T14:06:44Z" level=info msg="Creating container using image [local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.8] with environment variables: map[]"
time="2022-12-16T14:06:44Z" level=info msg="Created deployment resource."
time="2022-12-16T14:06:44Z" level=info msg="Creating deployment in cluster with name: deployment-deployment"
time="2022-12-16T14:06:44Z" level=info msg="Watching for deployment to exist."
time="2022-12-16T14:06:44Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event ADDED"
time="2022-12-16T14:06:47Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event MODIFIED"
time="2022-12-16T14:06:47Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event MODIFIED"
time="2022-12-16T14:06:52Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event MODIFIED"
time="2022-12-16T14:06:52Z" level=info msg="Deployment is reporting Available with True."
time="2022-12-16T14:06:52Z" level=info msg="Created deployment in kuberhealthy namespace: deployment-deployment"
time="2022-12-16T14:06:52Z" level=info msg="Creating service resource for kuberhealthy namespace."
time="2022-12-16T14:06:52Z" level=info msg="Created service resource."
time="2022-12-16T14:06:52Z" level=info msg="Creating service in cluster with name: deployment-svc"
time="2022-12-16T14:06:52Z" level=info msg="Watching for service to exist."
time="2022-12-16T14:06:52Z" level=debug msg="Received an event watching for service changes: ADDED"
time="2022-12-16T14:06:52Z" level=info msg="Cluster IP found: 10.x.x.xx"
time="2022-12-16T14:06:52Z" level=info msg="Created service in kuberhealthy namespace: deployment-svc"
time="2022-12-16T14:06:52Z" level=debug msg="Retrieving a cluster IP belonging to: deployment-svc"
time="2022-12-16T14:06:52Z" level=info msg="Found service cluster IP address: 10.x.x.xx"
time="2022-12-16T14:06:52Z" level=info msg="Looking for a response from the endpoint."
time="2022-12-16T14:06:52Z" level=debug msg="Setting timeout for backoff loop to: 3m0s"
time="2022-12-16T14:06:52Z" level=info msg="Beginning backoff loop for HTTP GET request."
time="2022-12-16T14:06:52Z" level=debug msg="Making GET to http://10.x.x.xx"
time="2022-12-16T14:06:52Z" level=info msg="Successfully made an HTTP request on attempt: 1"
time="2022-12-16T14:06:52Z" level=info msg="Got a 200 with a GET to http://10.x.x.xx"
time="2022-12-16T14:06:52Z" level=info msg="Got a result from GET request backoff: 200 OK"
time="2022-12-16T14:06:52Z" level=info msg="Successfully hit service endpoint."
time="2022-12-16T14:06:52Z" level=info msg="Rolling update option is enabled. Performing roll."
time="2022-12-16T14:06:52Z" level=info msg="Creating deployment resource with 2 replica(s) in kuberhealthy namespace using image [local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.9] with environment variables: map[]"
time="2022-12-16T14:06:52Z" level=info msg="Creating container using image [local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.9] with environment variables: map[]"
time="2022-12-16T14:06:52Z" level=info msg="Created rolling-update deployment resource."
time="2022-12-16T14:06:52Z" level=info msg="Performing rolling-update on deployment deployment-deployment to [local.imageregistry.com/nginxinc/nginx-unprivileged:1.17.9]"
time="2022-12-16T14:06:53Z" level=debug msg="Watching for deployment rolling-update to complete."
time="2022-12-16T14:06:53Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event ADDED"
time="2022-12-16T14:06:56Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event MODIFIED"
time="2022-12-16T14:07:02Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event MODIFIED"
time="2022-12-16T14:07:02Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event MODIFIED"
time="2022-12-16T14:07:02Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event MODIFIED"
time="2022-12-16T14:07:08Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event MODIFIED"
time="2022-12-16T14:07:13Z" level=debug msg="Received an event watching for deployment changes: deployment-deployment got event MODIFIED"
time="2022-12-16T14:07:13Z" level=debug msg="Rolling-update is assumed to be completed, sending result to channel."
time="2022-12-16T14:07:13Z" level=info msg="Rolled deployment in kuberhealthy namespace: deployment-deployment"
time="2022-12-16T14:07:13Z" level=info msg="Looking for a response from the endpoint."
time="2022-12-16T14:07:13Z" level=debug msg="Setting timeout for backoff loop to: 3m0s"
time="2022-12-16T14:07:13Z" level=info msg="Beginning backoff loop for HTTP GET request."
time="2022-12-16T14:07:13Z" level=debug msg="Making GET to http://10.x.x.xx"
time="2022-12-16T14:07:13Z" level=info msg="Successfully made an HTTP request on attempt: 1"
time="2022-12-16T14:07:13Z" level=info msg="Got a 200 with a GET to http://10.x.x.xx"
time="2022-12-16T14:07:13Z" level=info msg="Got a result from GET request backoff: 200 OK"
time="2022-12-16T14:07:13Z" level=info msg="Successfully hit service endpoint after rolling-update."
time="2022-12-16T14:07:13Z" level=info msg="Cleaning up deployment and service."
time="2022-12-16T14:07:13Z" level=info msg="Attempting to delete service deployment-svc in kuberhealthy namespace."
time="2022-12-16T14:07:13Z" level=debug msg="Checking if service has been deleted."
time="2022-12-16T14:07:13Z" level=debug msg="Delete service and wait has not yet timed out."
time="2022-12-16T14:07:13Z" level=debug msg="Waiting 5 seconds before trying again."
time="2022-12-16T14:07:18Z" level=info msg="Attempting to delete deployment in kuberhealthy namespace."
time="2022-12-16T14:07:18Z" level=debug msg="Checking if deployment has been deleted."
time="2022-12-16T14:07:18Z" level=debug msg="Delete deployment and wait has not yet timed out."
time="2022-12-16T14:07:18Z" level=debug msg="Waiting 5 seconds before trying again."

dushyy avatar Dec 16 '22 14:12 dushyy

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment on the issue or this will be closed in 15 days.

github-actions[bot] avatar Jan 26 '24 01:01 github-actions[bot]

I am not sure what's up here, but this isn't a currently known bug, so it might be something very hard to diagnose in your cluster?

integrii avatar Feb 10 '24 06:02 integrii

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment on the issue or this will be closed in 15 days.

github-actions[bot] avatar Mar 21 '24 00:03 github-actions[bot]

This issue was closed because it has been stalled for 15 days with no activity. Please reopen and comment on the issue if you believe it should stay open.

github-actions[bot] avatar Apr 06 '24 00:04 github-actions[bot]