Reloader icon indicating copy to clipboard operation
Reloader copied to clipboard

Reloader frequent restarts due to failed liveness probes

Open messiahUA opened this issue 3 years ago • 4 comments

I'm facing an issue with frequent restarts on v0.0.95.

Liveness endpoint (/metrics) response may sometimes be in the range of 1-5 seconds. Of course I can just increase the timeout (which is 1 second by default), but this only hides the problem. I believe there is some inefficiency in the code which affects even /metrics responses.

There are quite a lot of secrets and configmaps in the cluster, so it might put a strain, but there are no cpu and memory limits, so it should just take as much as needed and continue working. I think that /metrics should have its own thread or it would be even better to have a dedicated /readiness and /liveness endpoints which will really check and appropriately report the status of the service. Otherwise its unreliable to run in production especially considering there is no HA, so if pod is restarted I believe it will lose any info about the resources and may miss triggering reloads.

messiahUA avatar Jul 22 '21 14:07 messiahUA

There are quite a lot of secrets and configmaps in the cluster

How many exactly?

rasheedamir avatar Jul 22 '21 16:07 rasheedamir

How many exactly?

979 (configmaps + secrets)

although want to note that only some of the workloads have reloader annotation with only one secret referenced each

messiahUA avatar Jul 22 '21 18:07 messiahUA

@faizanahmad055 can you take a look?

rasheedamir avatar Jul 30 '21 10:07 rasheedamir

@rasheedamir sure will take a look.

faizanahmad055 avatar Jul 30 '21 10:07 faizanahmad055