helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[synthetic-monitoring-agent] fix deployment not starting on update/auto-scaling

Open Iridias opened this issue 1 year ago • 1 comments

The implementation of the agent does not seem to allow for concurrency. Thus scaling the deployment - either on helm-upgrade or auto-scaling - will result in the new PODs to never become ready!

So any helm-upgrade will run into a timeout and then abort.

In the logs you'll find the following messages (if debug is enabled):

{"level":"info","program":"synthetic-monitoring-agent","subsystem":"updater","error":"registering probe with synthetic-monitoring-api, response: probe already exists","was_connected":false,"connection_state":"READY","time":1708611775289,"caller":"github.com/grafana/synthetic-monitoring-agent/internal/checks/checks.go:259","message":"broke out of loop"}
{"level":"warn","program":"synthetic-monitoring-agent","subsystem":"updater","error":"registering probe with synthetic-monitoring-api, response: probe already exists","connection_state":"READY","time":1708611775289,"caller":"github.com/grafana/synthetic-monitoring-agent/internal/checks/checks.go:309","message":"handling check changes"}

With emphasis on: response: probe already exists

To fix that, I changed the Deployment to a StatefulSet, as k8s ensures, that the old POD is killed/deleted before spawning the new one. I also removed all the autoscaling-resources, as they're not useful anyway.

And of course, I also successfully tested the changes on one of our clusters.

Iridias avatar Feb 27 '24 11:02 Iridias

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Feb 27 '24 11:02 CLAassistant

@Iridias Can you split your PR one PR per chart? Otherwise the CI won't be able to merge.

zanhsieh avatar Mar 31 '24 22:03 zanhsieh