cluster-api-provider-aws 🌱Migrate elb to AWS SDK v2

trafficstars

What type of PR is this? /kind cleanup /kind deprecation

What this PR does / why we need it: This PR migrates elb to AWS SDK V2

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #5405

Special notes for your reviewer:

Checklist:

[x] squashed commits
[ ] includes documentation
[x] includes emoji in title
[x] adds unit tests
[x] adds or updates e2e tests

Release note:

Migrate elb to AWS SDK v2

Jun 28 '25 06:06 phuhung273

Hi @phuhung273. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jun 28 '25 06:06 k8s-ci-robot

/ok-to-test

Jun 30 '25 08:06 damdo

@phuhung273 it looks like the linter test is not happy, would you be able to fix it? Thanks!

Jun 30 '25 09:06 damdo

Sure thanks @damdo for taking a look, lint issue fixed.

Jun 30 '25 12:06 phuhung273

/test pull-cluster-api-provider-aws-e2e-eks pull-cluster-api-provider-aws-e2e

Jun 30 '25 12:06 phuhung273

/test pull-cluster-api-provider-aws-e2e

Jun 30 '25 13:06 damdo

@damdo: GitHub didn't allow me to assign the following users: punkwalker.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide

In response to this:

Looks reasonable to me, left one question.

/assign @punkwalker @nrb @richardcase

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jun 30 '25 16:06 k8s-ci-robot

@phuhung273
Thank you for working on this.

update: I have created a PR #5574 which will solve ServiceLimiter problem. You can find more details in it.

On Side note: Looks like you implemented a new method WithRequestMetricContextMiddleware in metricsV2 package during SQS client migration. Any specific reason you did not use the existing awsmetricsv2,WithMiddleware option?

Jun 30 '25 22:06 punkwalker

@punkwalker The reason i didn't use awsmetricsv2.WithMiddleware because it also include recordAWSPermissionsIssue(target) - which the V1 NewGlobalSQSClient doesn't have https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/5562/files#diff-bc3c4b26fb02ca5f71b1e4bbcefb77736adcbefa0a5f1df66b48228bed9ecb7cL136-L139

So I created WithRequestMetricContextMiddleware with only RequestMetric. Im not fully understand this part so it might be wrong, pls correct me.

Jul 01 '25 05:07 phuhung273

@punkwalker The reason i didn't use awsmetricsv2.WithMiddleware because it also include recordAWSPermissionsIssue(target) - which the V1 NewGlobalSQSClient doesn't have https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/5562/files#diff-bc3c4b26fb02ca5f71b1e4bbcefb77736adcbefa0a5f1df66b48228bed9ecb7cL136-L139

So I created WithRequestMetricContextMiddleware with only RequestMetric. Im not fully understand this part so it might be wrong, pls correct me.

I think it is better to keep the consistency between clients. There is no harm in using the permission middleware as it is only for instrumentation.

Jul 01 '25 07:07 punkwalker

/retest-required

Jul 02 '25 13:07 phuhung273

/retest-required

Jul 02 '25 15:07 phuhung273

@phuhung273 we may need to check if e2e failures are legit or not

Jul 02 '25 16:07 damdo

Sure @damdo, I'm still not really get used to how an actual error looks like. If this run doesn't work I will try my local

Jul 02 '25 16:07 phuhung273

/test pull-cluster-api-provider-aws-e2e-blocking

Jul 03 '25 08:07 damdo

@phuhung273 there are panics in the CAPA controller (here)

E0703 09:08:52.610094       1 signal_unix.go:917] "Observed a panic" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="quick-start-ee17bg/quick-start-m3dnl6-rd227" namespace="quick-start-ee17bg" name="quick-start-m3dnl6-rd227" reconcileID="5bf89cf8-313e-4ab9-9650-8102d92616f0" panic="runtime error: invalid memory address or nil pointer dereference" panicGoValue="\"invalid memory address or nil pointer dereference\"" stacktrace=<
	goroutine 382 [running]:
	k8s.io/apimachinery/pkg/util/runtime.logPanic({0x64b4800, 0xc002a386f0}, {0x52db8e0, 0x8d45210})
		/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:107 +0xbc
	sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile.func1()
		/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:108 +0x112
	panic({0x52db8e0?, 0x8d45210?})
		/usr/local/go/src/runtime/panic.go:791 +0x132
	sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/awserrors.(*SmithyError).ErrorCode(...)
		/workspace/pkg/cloud/awserrors/errors.go:268
	sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/throttle.getServiceLimiterMiddleware.func1({0x64b4800, 0xc002b28c90}, {{0x5a9d120?, 0xc002b28c00?}}, {0x64539e0, 0xc00242f540})
		/workspace/pkg/cloud/throttle/throttle.go:175 +0xd8
	github.com/aws/smithy-go/middleware.finalizeMiddlewareFunc.HandleFinalize(...)
		/go/pkg/mod/github.com/aws/[email protected]/middleware/step_finalize.go:67
	github.com/aws/smithy-go/middleware.decoratedFinalizeHandler.HandleFinalize(...)
		/go/pkg/mod/github.com/aws/[email protected]/middleware/step_finalize.go:200
	sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/metricsv2.getRequestMetricContextMiddleware.func1({0x64b4800, 0xc002b28c90}, {{0x5a9d120?, 0xc002b28c00?}}, {0x64539e0, 0xc00242f560})
		/workspace/pkg/cloud/metricsv2/metrics.go:149 +0x1b8
...

Would you please be able to check?

Having a quick look it might be starting from here and propagating down to the awserrors/errors.go:268

Jul 03 '25 10:07 damdo

Thanks @damdo for pointing it out. Fixed by adding a nil check to ServiceLimiter. Let me know if we should start full e2e or wait for unresolved conversations.

Jul 03 '25 14:07 phuhung273

/test pull-cluster-api-provider-aws-e2e-eks pull-cluster-api-provider-aws-e2e

Jul 04 '25 00:07 phuhung273

/test pull-cluster-api-provider-aws-e2e

Jul 04 '25 04:07 phuhung273

Thank you for helping out the entire process @damdo

Jul 04 '25 07:07 phuhung273

LGTM label has been added.

Git tree hash: c6ed3cdc90c7eaafbd2c7ae1b3deb5477eeb3a5b

Jul 04 '25 08:07 k8s-ci-robot

/assign @AndiDog @dlipovetsky

Jul 04 '25 08:07 damdo

Thanks

/lgtm

@punkwalker let us know if you are happy with these latest changes.

/assign @richardcase @nrb

Yes. LGTM.

Jul 04 '25 08:07 punkwalker

/approve

Jul 07 '25 17:07 nrb

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nrb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [nrb]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Jul 07 '25 17:07 k8s-ci-robot

cluster-api-provider-aws cluster-api-provider-aws copied to clipboard

🌱Migrate elb to AWS SDK v2

cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard