cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
:bug: wait for lb dns name to propagate before resolving
What type of PR is this?
/kind bug
What this PR does / why we need it:
Instead of trying to resolve the primary LB DNS name right after its creation, wait for it to propagate so the resolution is most likely to succeed.
This fixes an issue where the first "no such host" cached dns response with high TTL would make CAPA spin for minutes (as high as 15!) waiting for the DNS name to resolve even though it had already propagated a few minutes after the first attempt.
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #5032
Special notes for your reviewer:
I couldn't find a more elegant way to solve this other than a sleep after the LB is created. I wanted to add a retryAfterDuration here right after the DNS name is set and before the name resolution is attempted but it would involve somehow saving state of the timestamp in between reconcile loops.
Checklist:
- [ ] squashed commits
- [ ] includes documentation
- [X] includes emojis
- [ ] adds unit tests
- [ ] adds or updates e2e tests
Release note:
Fixed a possible issue with long wait times for primary Load Balancer DNS name resolution.