cluster-api-provider-aws icon indicating copy to clipboard operation
cluster-api-provider-aws copied to clipboard

🐛: skip APIServerELB DNS name resolution if internal

Open r4f4 opened this issue 9 months ago • 4 comments

What type of PR is this?

/kind bug

What this PR does / why we need it:

Skip APIServerELB DNS name resolution check when the LB is internal. The check will never work in air-gapped systems.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #4975

Special notes for your reviewer:

Checklist:

  • [X] squashed commits
  • [ ] includes documentation
  • [X] includes emojis
  • [ ] adds unit tests
  • [ ] adds or updates e2e tests

Release note:

Skip DNS name resolution check for internal APIServerELB.

r4f4 avatar May 10 '24 15:05 r4f4

Hi @r4f4. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar May 10 '24 15:05 k8s-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign vincepri for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar May 10 '24 15:05 k8s-ci-robot

/cc @mtulio @patrickdillon

r4f4 avatar May 10 '24 15:05 r4f4

@r4f4: GitHub didn't allow me to request PR reviews from the following users: mtulio, patrickdillon.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @mtulio @patrickdillon

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar May 10 '24 15:05 k8s-ci-robot

/test pull-cluster-api-provider-aws-test

damdo avatar May 20 '24 10:05 damdo

/hold While we double check if the name is resolvable in a SC2S region.

r4f4 avatar May 20 '24 23:05 r4f4

There's a chat thread about why the DNS resolving wait code was likely there. It might be to check LB availability. Did you see problems or delays after this change, such as CAPA requiring an additional wait before reconciling further (e.g. because the LB/DNS wasn't available yet)? That's the only concern I have. Otherwise LGTM.

AndiDog avatar May 27 '24 17:05 AndiDog

There's a chat thread about why the DNS resolving wait code was likely there. It might be to check LB availability. Did you see problems or delays after this change, such as CAPA requiring an additional wait before reconciling further (e.g. because the LB/DNS wasn't available yet)? That's the only concern I have. Otherwise LGTM.

When we encountered the issue we were creating a cluster in C2S via an emulator (SHIFT). The issue did not happen in another emulator (Combine). We're trying to test on a real C2S/SC2S env to determine if the internal LB name is resolvable. If so, it's an emulator bug. If not, I'll test again and pay attention to the reconcile process.

r4f4 avatar May 27 '24 17:05 r4f4

Please also test on a real, public AWS region if you can. It would be interesting to see how removing this wait behaves, in order to find out whether it could be removed overall, or if then we need some retries for the control plane setup or so.

AndiDog avatar May 29 '24 11:05 AndiDog

I'll wait to review until hearing back on whether or not this happens in the real AWS environment.

nrb avatar May 31 '24 19:05 nrb

We could not reproduce in a real environment and SHIFT confirmed there was a bug in the emulator. There is still a possible problem with waiting for the DNS resolution but I'll open a separate issue for that.

/close

r4f4 avatar Jun 21 '24 09:06 r4f4

@r4f4: Closed this PR.

In response to this:

We could not reproduce in a real environment and SHIFT confirmed there was a bug in the emulator. There is still a possible problem with waiting for the DNS resolution but I'll open a separate issue for that.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jun 21 '24 09:06 k8s-ci-robot