cluster-api-provider-aws icon indicating copy to clipboard operation
cluster-api-provider-aws copied to clipboard

:bug: wait for lb dns name to propagate before resolving

Open r4f4 opened this issue 1 year ago • 5 comments
trafficstars

What type of PR is this?

/kind bug

What this PR does / why we need it:

Instead of trying to resolve the primary LB DNS name right after its creation, wait for it to propagate so the resolution is most likely to succeed.

This fixes an issue where the first "no such host" cached dns response with high TTL would make CAPA spin for minutes (as high as 15!) waiting for the DNS name to resolve even though it had already propagated a few minutes after the first attempt.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #5032

Special notes for your reviewer:

I couldn't find a more elegant way to solve this other than a sleep after the LB is created. I wanted to add a retryAfterDuration here right after the DNS name is set and before the name resolution is attempted but it would involve somehow saving state of the timestamp in between reconcile loops.

Checklist:

  • [ ] squashed commits
  • [ ] includes documentation
  • [X] includes emojis
  • [ ] adds unit tests
  • [ ] adds or updates e2e tests

Release note:

Fixed a possible issue with long wait times for primary Load Balancer DNS name resolution.

r4f4 avatar Jun 22 '24 09:06 r4f4