Make Stage controller requeue delay configurable
Checklist
- [x] I've searched the issue queue to verify this is not a duplicate feature request.
- [x] I've pasted the output of
kargo version, if applicable. - [x] I've pasted logs, if applicable.
Proposed Feature
Make the reconcile requeue delay configurable for these controllers:
-
cluster_config -
stage -
projects -
project_config
Currently the controllers uses a hard-coded five-minute delay. for example in internal/controller/stages/regular_stages.go at lines 383–385:
// Otherwise, requeue after a delay.
// TODO: Make the requeue delay configurable.
return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil
Expose a single config knob per controller (env var and Helm value) with a default of 5m.
Motivation
Operators cannot tune reconciliation frequency without rebuilding or patching. Configurable intervals allow faster feedback for time-sensitive workloads and lower resource usage for quiet ones, without using custom patched versions.
This - which seems to have some crossover with https://github.com/akuity/kargo/issues/5162 - would help to make Kargo promotions more responsive. An issue that we have curerntly is that any steps that do effective polling (e.g. wait on PR merge, or http steps that use polling) take too long to detect changes in the remote systems that they are awaiting on.
Ideally, promotion flows should be responsive, with near realtime detection to changes in these types of polling tasks.
As a basic workaround, one can hit the refresh button on a promotion which seems to force an immediate re-request. But that's not a workable solution in the wild.
As a better workaround, we're considering implementing a cron job that runs every few seconds that basically just calls kargo refresh stage on the stages that we need to be more responsive. Though I'm not sure what the side effects of that might be.
Ideally, kargo would make this configurable either per stage, per promotion, or per step so that we can tailor things a little more to our needs.
We know this is something that people want -- so much so that I'm pinning it for visibility and a lesser chance of more duplicate issues being opened. This is probably going to happen eventually. To be transparent about why we're not prioritizing this more highly, we anticipate significant performance impacts when everyone inevitably goes and cranks the interval way down.
There are many other issues open having to do with performance improvements and, in general, our goal is improving on individual factors that are making people want to crank the interval way down so that when this feature is eventually added, not everyone will feel compelled to immediately start re-reconciling Promotions every 10s (for instance) and bring their cluster to its knees.