aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard
Add support for VPC Endpoint Services
Issue
https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/1859
Description
This change adds support for creating, deleting and updating VPC endpoint services along with their permissions. It supports the three suggested configuration options - allowed priciples, acceptance required, and private DNS name.
An enable annotation was also added so support enabling/disabling the feature on a per load balancer basis. The proposed design only supported enabling/disabling this at the controller level.
Outputting the created VPC Endpoint Service parameters is deliberatly not supported as there are still open questions as to the best approach to be used and this change is already far too big and adding more to it does not seem sensible. This feature can be added in a follow-up change.
This change has been manually tested on a Kubernetes service.
I apologise for the large PR. I couldn't determine suitable breakpoints to split this change up into smaller changes. I can do so if you can suggest the break points. I'm also fairly new to Golang so am very happy for any feedback as I know the code quality isn't amazing here.
Checklist
- [x] Added tests that cover your change (if possible)
- [x] Added/modified documentation as required (such as the
README.md, or thedocsdirectory) - [x] Manually tested
- [x] Made sure the title of the PR is a good description that can go into the release notes
BONUS POINTS checklist: complete for good vibes and maybe prizes?! :exploding_head:
- [ ] Backfilled missing tests for code in same general area :tada:
- [ ] Refactored something and made the world a better place :star2:
Hi @hintofbasil. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: hintofbasil
To complete the pull request process, please assign kishorj after the PR has been reviewed.
You can assign the PR to them by writing /assign @kishorj in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@hintofbasil, thanks for your changes. We are currently finalizing the v2.4.2 patch release, I will review your changes after the patch release.
Codecov Report
Patch coverage: 63.57% and project coverage change: +0.48 :tada:
Comparison is base (
ff8c13d) 55.05% compared to head (afa4b7e) 55.53%.
Additional details and impacted files
@@ Coverage Diff @@
## main #2636 +/- ##
==========================================
+ Coverage 55.05% 55.53% +0.48%
==========================================
Files 148 157 +9
Lines 8532 9002 +470
==========================================
+ Hits 4697 4999 +302
- Misses 3505 3662 +157
- Partials 330 341 +11
| Impacted Files | Coverage Δ | |
|---|---|---|
| pkg/config/addons_config.go | 0.00% <0.00%> (ø) |
|
| pkg/deploy/elbv2/listener_manager.go | 1.08% <ø> (ø) |
|
| pkg/deploy/stack_deployer.go | 0.00% <0.00%> (ø) |
|
| pkg/deploy/tracking/provider_mocks.go | 0.00% <0.00%> (ø) |
|
| pkg/networking/vpc_endpoint_service_info.go | 0.00% <0.00%> (ø) |
|
| pkg/networking/vpc_endpoint_service_manager.go | 0.00% <0.00%> (ø) |
|
| ...g/networking/vpc_endpoint_service_manager_mocks.go | 0.00% <0.00%> (ø) |
|
| pkg/service/model_build_endpoint_service.go | 54.00% <54.00%> (ø) |
|
| pkg/service/model_builder.go | 86.86% <60.00%> (-1.43%) |
:arrow_down: |
| pkg/deploy/ec2/tagging_manager__mocks.go | 69.69% <69.69%> (ø) |
|
| ... and 5 more |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Hi @kishorj, thanks for agreeing to take a look. A few people in Skyscanner have agreed to review it in the mean time. Matteo has already put one review in. Hopefully this makes things easier for you.
We are planning on deploying this internally in the next few weeks. We understand it likely won't be reviewed by then - but would it be possible to get the e2e tests ran against this? I would make us feel much more confident in running a fork and setting these up ourselves seems unrealistic.
/ok-to-test
This has been rebased and the formatting changes above implemented.
I'm unable to replicate the e2e test failure.
I've added the following resources to a cluster running this branch of the controller.
apiVersion: v1
kind: Namespace
metadata:
name: test
---
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
name: aws-load-balancer-controller-test
spec:
controller: ingress.k8s.aws/alb
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: aws-load-balancer-controller-test
namespace: test
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/subnets: subnet-id-1,subnet-id-2
spec:
ingressClassName: aws-load-balancer-controller-test
rules:
- host: ""
http:
paths:
- path: /path
pathType: Exact
backend:
service:
name: aws-load-balancer-controller-test
port:
name: http
---
apiVersion: v1
kind: Service
metadata:
name: aws-load-balancer-controller-test
namespace: test
spec:
ports:
- name: http
port: 80
targetPort: http
selector:
service: 'null'
type: NodePort
This brings up a load balancer which successfully provisions.
I've then deleted the Ingress and the load balancer successfully deletes. Then deleted the namespace which also successfully deletes.
From what I can see this follows the exact logic of the test.
As such I'm going to trigger a rerun to see if it was a flake.
/retest
How does this approach differ from @rifelpet's initial pass in #1948?
I didn't see it while reading the code, but I'm wondering whether there are more IAM actions that we'll need to allow in the IAM policy used by this controller.
I didn't see it while reading the code, but I'm wondering whether there are more IAM actions that we'll need to allow in the IAM policy used by this controller.
You were correct, after adding these the tests are passing. Thanks!
How does this approach differ from @rifelpet's initial pass in https://github.com/kubernetes-sigs/aws-load-balancer-controller/pull/1948?
This is a continuation of their work. Rifelpet provided a rough outline for the code; function definitions and structs mainly. This adds the actual implementation to make the feature work. It sticks pretty closely to Rifelpet's original design - and even includes their commits showing how it was built up from there.
This is a continuation of their work.
Thank you for the detailed explanation.
I was going to say earlier this morning that configuring the private DNS name on the VPC endpoint service side seemed backwards to me, thinking that it would be consumers creating interface VPC endpoints that would decide on the name, but then I went back and read the AWS documentation on the feature, and see that your design matches that feature's interface: the service provider chooses a DNS name. I still find this so confusing. That's not your fault, though.
Thanks @rifelpet, seems quite a few of these typos escaped me. I've cleaned them all up in a single commit.
just wondering if it's ready to be merged? would like to have this feature in the default branch.
Looks to me like the tests timed out while still progressing. The failed test is given 10 attempts, each taking up to 70 seconds, to call the load balancer but the test timed out after approximately two and a half minutes.
Going to try a rerun as I can't see a hard-coded timeout anywhere and I can't see how this is related to changes in the PR.
/retest
i'll finish the review this week. and we can seek to merge this post our upcoming v2.4.3 release.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
@hintofbasil - Do you have time/bandwidth to rebase this week? Need help? (Eager to try this) Thanks for merge!
@hintofbasil - Do you have time/bandwidth to rebase this week? Need help? (Eager to try this) Thanks for merge!
Apologies. I didn't realise it had been so long since I looked at this.
I will try and put some time aside to rebase it this week.
We've been running this branch in production for some time now successfully but it would be great to get it merged upstream so we can get all the recent updates too.
It looks like the cluster failed to delete because one of the node groups wasn't deleted. I can't see any reason in the logs for the node group to stick around.
Open to any ideas of where to look if anyone wanted to lend a hand.
Going to try rerunning the tests in case the issue was a flake as it has that feel about it.
/retest
@hintofbasil I am following your PR closely and eager to try this. Hope you have time to finish this soon!
I think that all the issues raised so far in this PR have now been addressed. If I have missed one then please raise it again.
The PR as a whole is ready for review again.
I will make sure to fix any merge conflicts more promptly and try my best to keep this in a review-ready state from now on.
@hintofbasil I tested your PR and it creates the VPCES but deletion does not work. The LB controller is stuck with error:
error":"ResourceInUse: Load balancer 'arn:...' cannot be deleted because it is currently associated with another service\n\tstatus code: 400
The ES needs to be deleted first. Before that, probably all endpoints need to be rejected.
I also cannot see the endpoint service name. As it cannot be specified it would be better to attach it to the Service resource when the ES is created. Maybe as message in a condition.
tested your PR and it creates the VPCES but deletion does not work. The LB controller is stuck with error:
error":"ResourceInUse: Load balancer 'arn:...' cannot be deleted because it is currently associated with another service\n\tstatus code: 400The ES needs to be deleted first. Before that, probably all endpoints need to be rejected.
If you have connected an endpoint to your endpoint service then there is little I can do here as AWS doesn't let you delete an endpoint service with connected endpoints. As the endpoints are unknown to the controller these can not be automatically cleaned up. What would you imagine happening in this case?
If I've misunderstood the problem then let me know.
I also cannot see the endpoint service name. As it cannot be specified it would be better to attach it to the Service resource when the ES is created. Maybe as message in a condition.
I agree it would be useful to expose this somewhere. But to avoid the PR ballooning even more I would suggest this be added in the follow-up PR.
You are right. Rejecting the connection is not sufficient to delete the service. Didn't realise it. Anyway the removal attempt for the ES should be done first and the operator should wait until it is possible and proceed then with the load balancer. It allows proper cleanup of landscapes without involving the AWS API.