gateway-api icon indicating copy to clipboard operation
gateway-api copied to clipboard

Initial Egress GEP

Open quangnguyen101 opened this issue 2 years ago • 21 comments

What type of PR is this?

/kind gep

What this PR does / why we need it: GEP: Add support to Gateway API for Egress use case to enable egress traffic.

Which issue(s) this PR fixes:

Fixes #1856

Does this PR introduce a user-facing change?:

This GEP will propose a new API EgressRoute for the Gateway API to enable egress traffic.  

quangnguyen101 avatar Apr 24 '23 20:04 quangnguyen101

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: shaneutt / name: Shane Utt (ff40855997de5827bf515b45737c934f0ed84a54)
  • :white_check_mark: login: quangnguyen101 / name: Quang Nguyen (f69e5e8638ebfffc2d2c24dc6d0a245f1af7ed3b, 28049233f6ba8a2b091f81eb9e2e6ac14b1583af, d9dc7d9df75b69a27e232ea77445677f56437be4, 5c49ed49aa929356a8aea3786a5e2bdf5e46df32, f73ace08a7031e0ff9593869ac57c6bc5ab25046, 8ad2040aad0827867066752ec7c25b5f78cb387f, 0075f9600381342e551a989d38651bbd7d055e18, 4721decea6bd7e928e9428fe8ad2d86d3689aa48)

Welcome @quangnguyen101!

It looks like this is your first PR to kubernetes-sigs/gateway-api 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Apr 24 '23 20:04 k8s-ci-robot

Hi @quangnguyen101. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 24 '23 20:04 k8s-ci-robot

hey @quangnguyen101 would https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.21/#networkpolicyegressrule-v1-networking-k8s-io be another location to add this feature (SNAT) ? so L3 policies live in NetworkPolicy and L4-L7 live in Gateway API ?

arkodg avatar Apr 25 '23 16:04 arkodg

Got an email from Kubernetes Triage Robot to jiggle the PR with a tag:

/remove-lifecycle stale

quangnguyen101 avatar Jun 20 '23 18:06 quangnguyen101

Got an email from Kubernetes Triage Robot to jiggle the PR with a tag:

/remove-lifecycle stale

@quangnguyen101 what is needed to help move this forward? Where did we leave off here? :thinking:

shaneutt avatar Jun 20 '23 18:06 shaneutt

Hi @shaneutt, We are still very interesting in pursuing this GEP. The last comment was to see if other implementors are interested and other reviewers, when available, will comment. I just didn't want the bot to close the GEP. We are currently working on the API design and implementation but are holding off adding to the GEP, awaiting for more feedback. Please let me know if that is the correct next steps.

As always, thanks for the quick feedback.

quangnguyen101 avatar Jun 20 '23 18:06 quangnguyen101

/cc @youngnick @robscott

shaneutt avatar Jun 20 '23 18:06 shaneutt

@quangnguyen101 please do keep in mind that this GEP is in a bit of a weird spot, all things considered: it's going well outside of the historical space under which Gateway API was considered to operate and as such if this merges there's no guarantee that we'll ever promote it beyond Provisional, and I just wanna make sure you're aware of the possibility that if we can't find a great fit here it's possible we might shelf it.

That said, the intention is to help move proposals forward until we're certain it's gonna fit, or not. I think it's current state of being a statement of the problem and that we want to solve the problem, without trying yet to say "how" we're going to solve it is great and so I'm in favor of starting here.

/approve /lgtm

Hold however until either @robscott or @youngnick (or both) approve.

/hold

shaneutt avatar Jul 03 '23 18:07 shaneutt

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: quangnguyen101, shaneutt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jul 03 '23 18:07 k8s-ci-robot

Hi Shane,

Thanks for the heads up. I will definitely make it to the next meeting. The Tuesday sync time conflicts with a weekly team mtg so I was unable to make it this week. No worries about the status of the GEP. Just happy to know it's still in play.

Q

On Wed, Aug 2, 2023 at 7:55 AM Shane Utt @.***> wrote:

Going to bring this one up for a reminder at next week's community sync. A heads up that we recently announced feature freeze for features not essential to GA until v1 releases in the coming months, so this can merge but it will need to stay in Provisional status until GA.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/gateway-api/pull/1971#issuecomment-1662364575, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFCXCRSOGMCDBLUALV4K3XTJS7TANCNFSM6AAAAAAXKCAZFI . You are receiving this because you were mentioned.Message ID: @.***>

quangnguyen101 avatar Aug 02 '23 17:08 quangnguyen101

@quangnguyen101 it's definitely still in play. I actually deleted the previous comment because of a change in plans. Rather than discuss this at the next meeting I wanted to kind of pick your brain directly here in the PR first:

After talking this over with my fellow maintainers, it seems the idea of an EgressRoute causes a bit of confusion. And for the record, I am sorry if that feedback feels out of band: it's unfortunately been a matter of priorities as we're in a really tight spot to release GA on time for Chicago, and we're waiting to actively prioritize efforts not required for GA until after that release.

Have you had time to think about other ways in which we could model and define egress traffic behavior, without using an entire Route object just for that? For your use case, are alternatives where specification on existing routes to define them as egress paths viable? Basically I'm wondering what thoughts you might have on alternative models. I'm open to brainstorming this a bit as well, say if you and your teammates wanted to jump on a zoom with me and we could do some brainstorming I would be up for that.

Let me know your thoughts? :thinking:

shaneutt avatar Aug 02 '23 17:08 shaneutt

Hi Shane,

We've actually debated the "Route" part as well. It seems in GW API *Route like httproute or tcproute expects an association with a k8s "service" that fronts application pods. Whereas in the case of egress, at least in our implementation, there is no k8s service but only a Virtual Server and some networking routing that we configure to route all egress'ing traffic to our Virtual Server and then it redirect traffic to the appropriate external networks.

I think the team is more than open to a zoom session. I'll run it by them to see who can/should attend.

Thanks for the offer! This will help us in our planning.

Quang

On Wed, Aug 2, 2023 at 10:56 AM Shane Utt @.***> wrote:

@quangnguyen101 https://github.com/quangnguyen101 it's definitely still in play. I actually deleted the previous comment because of a change in plans. Rather than discuss this at the next meeting I wanted to kind of pick your brain directly here in the PR first:

After talking this over with my fellow maintainers, it seems the idea of an EgressRoute causes a bit of confusion. And for the record, I am sorry if that feedback feels out of band: it's unfortunately been a matter of priorities as we're in a really tight spot to release GA on time for Chicago, and we're waiting to actively prioritize efforts not required for GA until after that release.

Have you had time to think about other ways in which we could model and define egress traffic behavior, without using an entire Route object just for that? For your use case, are alternatives where specification on existing routes to define them as egress paths viable? Basically I'm wondering what thoughts you might have on alternative models. I'm open to brainstorming this a bit as well, say if you and your teammates wanted to jump on a zoom with me and we could do some brainstorming I would be up for that.

Let me know your thoughts? 🤔

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/gateway-api/pull/1971#issuecomment-1662698651, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFCXCMM7GMG7KAZRATOPTXTKIF7ANCNFSM6AAAAAAXKCAZFI . You are receiving this because you were mentioned.Message ID: @.***>

quangnguyen101 avatar Aug 02 '23 18:08 quangnguyen101

Can we try to do a zoom next week?

Thanks. Quang

On Wed, Aug 2, 2023 at 11:56 AM Quang Nguyen @.***> wrote:

Hi Shane,

We've actually debated the "Route" part as well. It seems in GW API *Route like httproute or tcproute expects an association with a k8s "service" that fronts application pods. Whereas in the case of egress, at least in our implementation, there is no k8s service but only a Virtual Server and some networking routing that we configure to route all egress'ing traffic to our Virtual Server and then it redirect traffic to the appropriate external networks.

I think the team is more than open to a zoom session. I'll run it by them to see who can/should attend.

Thanks for the offer! This will help us in our planning.

Quang

On Wed, Aug 2, 2023 at 10:56 AM Shane Utt @.***> wrote:

@quangnguyen101 https://github.com/quangnguyen101 it's definitely still in play. I actually deleted the previous comment because of a change in plans. Rather than discuss this at the next meeting I wanted to kind of pick your brain directly here in the PR first:

After talking this over with my fellow maintainers, it seems the idea of an EgressRoute causes a bit of confusion. And for the record, I am sorry if that feedback feels out of band: it's unfortunately been a matter of priorities as we're in a really tight spot to release GA on time for Chicago, and we're waiting to actively prioritize efforts not required for GA until after that release.

Have you had time to think about other ways in which we could model and define egress traffic behavior, without using an entire Route object just for that? For your use case, are alternatives where specification on existing routes to define them as egress paths viable? Basically I'm wondering what thoughts you might have on alternative models. I'm open to brainstorming this a bit as well, say if you and your teammates wanted to jump on a zoom with me and we could do some brainstorming I would be up for that.

Let me know your thoughts? 🤔

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/gateway-api/pull/1971#issuecomment-1662698651, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFCXCMM7GMG7KAZRATOPTXTKIF7ANCNFSM6AAAAAAXKCAZFI . You are receiving this because you were mentioned.Message ID: @.***>

quangnguyen101 avatar Aug 03 '23 15:08 quangnguyen101

Can we try to do a zoom next week?

Yes, hit me up on k8s slack

shaneutt avatar Aug 03 '23 15:08 shaneutt

Also after our discussion today, I would recommend we bring some of this discussion to the bi-monthly SIG Network sync, as this is steeped in more low-level and traditional networking concepts, and the group there is well versed. Recently when discussing routability the SIG Network group pointed us more towards the multi-network project, and generally speaking I think what you're getting into with this GEP has some very interesting and far reaching implications.

shaneutt avatar Aug 14 '23 18:08 shaneutt

Hi Shane,

Thanks again for chatting with us on Monday*. Sorry for the delayed response. Again, we really appreciate your time and attention. Everyone is busy with summer and the GA target looming, it's understandable. I have passed along your notes to the team. I added my f5 email...maybe that will speed up my response. :)

I will make the updates to the GEP soon and add the SIG Network sync to my calendar.

Thanks. Q

On Mon, Aug 14, 2023 at 11:40 AM Shane Utt @.***> wrote:

Also after our discussion today, I would recommend we bring some of this discussion to the bi-monthly SIG Network sync, as this is steeped in more low-level and traditional networking concepts, and the group there is well versed. Recently when discussing routability the SIG Network group pointed us more towards the multi-network project, and generally speaking I think what you're getting into with this GEP has some very interesting and far reaching implications.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/gateway-api/pull/1971#issuecomment-1677876970, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFCXEBTYMBEJQXUUGWD2TXVJWK3ANCNFSM6AAAAAAXKCAZFI . You are receiving this because you were mentioned.Message ID: @.***>

quangnguyen101 avatar Aug 17 '23 15:08 quangnguyen101

New changes are detected. LGTM label has been removed.

k8s-ci-robot avatar Aug 21 '23 21:08 k8s-ci-robot

Once #2689 merges, this will need a rebase and update - the GEP files have moved.

youngnick avatar Dec 22 '23 02:12 youngnick

Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages.

The list of commits with invalid commit messages:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Mar 12 '24 18:03 k8s-ci-robot

@quangnguyen101: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gateway-api-test ff40855997de5827bf515b45737c934f0ed84a54 link true /test pull-gateway-api-test
pull-gateway-api-verify ff40855997de5827bf515b45737c934f0ed84a54 link true /test pull-gateway-api-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Mar 12 '24 18:03 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 10 '24 18:06 k8s-triage-robot

There hasn't been activity on this in a long while, and some of the fundamental problems (for instance, not having enough allied implementations that need this yet) still remain. I think at this point it's pretty fair to say that we should probably close this one for now?

/close

@quangnguyen101 if you disagree however, please do feel free to re-open if you're still ready to push! And in any case, I personally haven't forgotten about egress, I'm just wondering if its something we need to work on at a different level (that is to say, perhaps outside of Gateway API?).

shaneutt avatar Jun 21 '24 15:06 shaneutt

@shaneutt: Closed this PR.

In response to this:

There hasn't been activity on this in a long while, and some of the fundamental problems (for instance, not having enough allied implementations that need this yet) still remain. I think at this point it's pretty fair to say that we should probably close this one for now?

/close

@quangnguyen101 if you disagree however, please do feel free to re-open if you're still ready to push! And in any case, I personally haven't forgotten about egress, I'm just wondering if its something we need to work on at a different level (that is to say, perhaps outside of Gateway API?).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jun 21 '24 15:06 k8s-ci-robot

I'm just wondering if its something we need to work on at a different level (that is to say, perhaps outside of Gateway API?)

This is what I've been wondering about too, and at a minimum better defining a problem space that makes sense to solve within Gateway API if applicable. I'm curious if some of the telco/IP use cases may be a better fit for an L3 solution in the Network Policy working group, whether that's extending AdminNetworkPolicy or some new resource, and even some common mesh functionality like FQDN egress filtering might make sense to implement over there too.

mikemorris avatar Jun 21 '24 16:06 mikemorris

I'm just wondering if its something we need to work on at a different level (that is to say, perhaps outside of Gateway API?)

This is what I've been wondering about too, and at a minimum better defining a problem space that makes sense to solve within Gateway API if applicable.

:+1:

I'm curious if some of the telco/IP use cases may be a better fit for an L3 solution

Yes, @quangnguyen101 and some of his team (Philip Klatte in particular) and I talked at some length about the possibilities for KNI to help with their use cases. I think the multi-network project also potentially has some play here.

in the Network Policy working group, whether that's extending AdminNetworkPolicy or some new resource, and even some common mesh functionality like FQDN egress filtering might make sense to implement over there too.

I'm open to suggestion, but I guess I would need to hear a bit more :thinking: If nothing else, it can't hurt to ask that group some of their thoughts on the subject. I've put an agenda item from us on the netpol meeting for July 2nd to at least just float the thoughts out there with the group. If you can't make that one let me know I'm happy to wait until a time we can both get their and have our coffee hot and ready :coffee:

shaneutt avatar Jun 21 '24 16:06 shaneutt

Hi @shaneutt @mikemorris thanks for your continue attention to this GEP. Sorry I have been busy on a high priority/tight schedule project, etc. etc. We are actually planning on a GW API implementation for Egress in our product along with TCP/UDP and HTTP that we've already implemented. @mikemorris can you please forward me the netpol meeting? Thanks!

quangnguyen101 avatar Jun 21 '24 17:06 quangnguyen101

Hey @quangnguyen101! Let's keep staying in touch. Here's the list of meetings which includes the details for the netpol meeting!

shaneutt avatar Jun 21 '24 18:06 shaneutt

Hi there, I'm relatively inexperienced with the work in this group, but was trying to look for this feature.

From a functional perspective, how is this proposal different from this Egress Gateway implementation example from Istio docs?

  1. In the example below, I believe they connected 2 TLSRoutes via the same gateway to achieve Egress control.
  2. Is this proposal focused on adding syntax to the Gateway to allow the same (or more) Egress control?

https://istio.io/latest/docs/tasks/traffic-management/egress/egress-gateway/#egress-gateway-for-https-traffic Screenshot 2024-07-02 at 9 10 41 AM

hochoy avatar Jul 02 '24 16:07 hochoy

From a functional perspective, how is this proposal different from this Egress Gateway implementation example from Istio docs?

@hochoy This proposal would be to adopt functionality like that example within the actual Gateway API spec. Istio is using the TLSRoute CRD there, but with a bit of non-standard usage leveraging the extensibility available through the TLSRoute resource:

  • Adding an annotation to configure the Gateway with cluster-internal routability (refs https://github.com/kubernetes-sigs/gateway-api/issues/1651 attempting to bring this capability in-spec)
  • Using parentRefs on the first TLSRoute to attach to an Istio ServiceEntry CRD (Gateway API doesn't have a comparable resource like this currently)
  • Using backendRefs on the second TLSRoute to attach to an Istio Hostname CRD (feels comparable to a Kubernetes Service resource with type ExternalName but there's been a lot of hesitancy around doing anything with ExternalName because of historical CVEs/vulnerabilties)

While this example feels like it makes logical sense composed like this, it does feel sligthly more verbose (requires deploying ~5 different resources) than I'd ideally like to see in spec.

mikemorris avatar Jul 02 '24 16:07 mikemorris