consul-k8s icon indicating copy to clipboard operation
consul-k8s copied to clipboard

Fix watchReapableServices getting stuck after failures

Open geobeau opened this issue 11 months ago • 0 comments

Changes proposed in this PR:

If consulClient.Catalog().NodeServiceList fails for more than 15 minutes (max duration of NewExponentialBackOff()), the loop will get stuck because minWaitCh is not rearmed. case <-minWaitCh: will wait forever.

Fix is not make sure it is always rearmed, error or not. It could be set to 0 but since it was failing for 15 minutes, waiting minWait more time is not an issue.

Also added a log in the backoff function to show the errors at every retry (and not only when the retrybackoff finally fails). It should help see that something is not working at all.

How I've tested this PR:

Manually

How I expect reviewers to test this PR:

  • set -consul-api-timeout=1s on sync-catalog, it should enter the always failing of the above mentioned function, after waiting 15 minutes, it should continue to retry with the patch, otherwise it will be stuck forever

Checklist:

geobeau avatar Jul 21 '23 07:07 geobeau