argo-cd icon indicating copy to clipboard operation
argo-cd copied to clipboard

Add Webhook Jitter to Prevent Thundering-Herd Refresh Spikes

Open adityaraj178 opened this issue 3 weeks ago • 2 comments

Summary

Large webhook events currently push all affected Applications into the controller at once, causing thundering-herd behavior and heavy load on the repo server.

Motivation

In environments where a single repository commit impacts hundreds or thousands of Applications, all webhook-triggered refreshes arrive simultaneously. This results in bursts of git operations, parallel manifest generation, API server spikes, and prolonged controller CPU saturation. These load surges can slow down reconciliation, degrade repo-server performance, and in extreme cases overload control-plane components. By adding jitter to webhook-triggered refresh events, we can eliminate these synchronized spikes and maintain stable cluster behavior even under large-scale Git updates.

Proposal

Expose jitter configuration through the webhook.reconciliation.jitter field in the argocd-cm ConfigMap, supporting standard duration strings. The feature remains fully backward compatible when jitter is disabled.

adityaraj178 avatar Dec 01 '25 16:12 adityaraj178

I can take this up. However, i'd first start with bringing this up in this week's contributor meeting.

nitishfy avatar Dec 02 '25 10:12 nitishfy

Hii @nitishfy , i had created a PR, I forgot to add close issue there :)

adityaraj178 avatar Dec 02 '25 11:12 adityaraj178

Perhaps only use the jitter when the number of Applications affected are > 10. This way, it does not unnecessarily delay the application sync/refresh for normal usage.

agaudreault avatar Dec 11 '25 16:12 agaudreault