helm-charts
helm-charts copied to clipboard
[synthetic-monitoring-agent] fix deployment not starting on update/auto-scaling
Retry #2994
The implementation of the agent does not seem to allow for concurrency. Thus scaling the deployment - either on helm-upgrade or auto-scaling - will result in the new PODs to never become ready!
So any helm-upgrade will run into a timeout and then abort.
In the logs you'll find the following messages (if debug is enabled):
{"level":"info","program":"synthetic-monitoring-agent","subsystem":"updater","error":"registering probe with synthetic-monitoring-api, response: probe already exists","was_connected":false,"connection_state":"READY","time":1708611775289,"caller":"github.com/grafana/synthetic-monitoring-agent/internal/checks/checks.go:259","message":"broke out of loop"}
{"level":"warn","program":"synthetic-monitoring-agent","subsystem":"updater","error":"registering probe with synthetic-monitoring-api, response: probe already exists","connection_state":"READY","time":1708611775289,"caller":"github.com/grafana/synthetic-monitoring-agent/internal/checks/checks.go:309","message":"handling check changes"}
With emphasis on: response: probe already exists
To fix that, I changed the Deployment
to a StatefulSet
, as k8s ensures, that the old POD is killed/deleted before spawning the new one.
I also removed all the autoscaling-resources, as they're not useful anyway.
And of course, I also successfully tested the changes on one of our clusters.