ingress-nginx icon indicating copy to clipboard operation
ingress-nginx copied to clipboard

Support Kubernetes EndpointSlices

Open raravena80 opened this issue 4 years ago • 30 comments

Support Kubernetes EndpointSlices. A newer feature in Kubernetes that allows restricting or customizing where traffic is sent in a Kubernetes cluster.

Background:

https://stackoverflow.com/questions/63399080/kubernetes-1-18-6-servicetopology-and-ingress-support

Not that I know of

K8s 1.17 and above (Beta): https://kubernetes.io/docs/concepts/services-networking/endpoint-slices

/kind feature

raravena80 avatar Aug 13 '20 19:08 raravena80

@raravena80 what are you trying to do exactly?

aledbf avatar Aug 13 '20 20:08 aledbf

K8s 1.17 and above (Beta): https://kubernetes.io/docs/concepts/services-networking/endpoint-slices

Right, but for some context, the majority of the users are still running k8s < 1.16, even 1.13. Adding a feature like this one only adds complexity to the project.

Without a clear problem this feature could solve, I don't see the reason to add support, at least until users run k8s > 1.17

aledbf avatar Aug 13 '20 20:08 aledbf

@aledbf this is based on this Stackoverflow question: https://stackoverflow.com/questions/63399080/kubernetes-1-18-6-servicetopology-and-ingress-support

Thanks!

raravena80 avatar Aug 13 '20 21:08 raravena80

this is based on this Stackoverflow question: https://stackoverflow.com/questions/63399080/kubernetes-1-18-6-servicetopology-and-ingress-support

Interesting.

The question itself, about service topology, can be solved using the annotation service-upstream That said, the source of the connection will be ingress-nginx, delegating the topology part to the k8s service topology feature (topologyKeys). But then you cannot have custom LB algorithms or sticky sessions.

The EndpointSlices part makes sense when you have services will lot of endpoints (> 100).

aledbf avatar Aug 13 '20 21:08 aledbf

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Nov 11 '20 21:11 fejta-bot

/remove-lifecycle stale

raravena80 avatar Nov 11 '20 21:11 raravena80

Endpoints slices are game changer not only for the scalability benefits they bring for services with a lot of endpoints, they also bring performance improvements and cost savings in cloud environment like aws.

It is possible to group endpoints per availability zone a based on the identity of the nginx pod you can prefer the endpoints in your zone instead of the others across zone. This saves you money and boost perfs because of the traffic staying in the same avz.

ltagliamonte-dd avatar Dec 15 '20 05:12 ltagliamonte-dd

this is based on this Stackoverflow question: https://stackoverflow.com/questions/63399080/kubernetes-1-18-6-servicetopology-and-ingress-support

Interesting.

The question itself, about service topology, can be solved using the annotation service-upstream That said, the source of the connection will be ingress-nginx, delegating the topology part to the k8s service topology feature (topologyKeys). But then you cannot have custom LB algorithms or sticky sessions.

The EndpointSlices part makes sense when you have services will lot of endpoints (> 100).

Please correct me @aledbf, but I believe it would make sense to consider endpoint slices and topology aware routing in this project as well. K8s services are kind of difficult to use in a HTTP/2 context eg. when using gRPC due to it's connection reuse/multiplexing. There is the possibility to use a headless service and DNS based client load balancing but this also comes with some issues eg. getting notice of new pods (can be possibility by lower the TTL). The only clean solution here is working on endpoints directly. On the client side, there is a project for this https://github.com/sercand/kuberesolver, even though it does not have support for endpoint slices and topology aware routing yet.

So if we wan't to have a topology aware routing (which does make sense for many cases, especially cost reduction in a multi AZ environment) for HTTP/2 we might need to include some logic working on endpoint slices and certain routing preferences.

See also: https://github.com/zalando/skipper/issues/1446 https://github.com/linkerd/linkerd2/pull/4780

ecktom avatar Jan 11 '21 17:01 ecktom

So if we wan't to have a topology-aware routing (which does make sense for many cases, especially cost reduction in a multi AZ environment) for HTTP/2 we might need to include some logic working on endpoint slices and certain routing preferences.

We have a KEP to add support for zone aware routing but such a feature requires massive changes in the lua side of the controller.

Using topology-aware routing (from k8s) means you lose several features from ingress-nginx, like sticky sessions, due to the use of the k8s service abstraction instead of endpoints.

aledbf avatar Jan 11 '21 18:01 aledbf

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Apr 11 '21 19:04 fejta-bot

/remove-lifecycle stale.

raravena80 avatar Apr 11 '21 20:04 raravena80

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

fejta-bot avatar May 11 '21 20:05 fejta-bot

/remove-lifecycle rotten.

raravena80 avatar May 11 '21 20:05 raravena80

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

fejta-bot avatar Jun 10 '21 20:06 fejta-bot

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jun 10 '21 20:06 k8s-ci-robot

Let's not forget that Endpoints resources are going to be deprecated very soon.

ltagliamonte-dd avatar Jun 10 '21 21:06 ltagliamonte-dd

/reopen

This is still not fixed and one can hit K8s control plane availability problems when there's a high churn on large services in the cluster and lots of ingress-nginx-controller replicas - apiserver needs to send notifications about endpoints changes to lots of watchers which often ends up with its overload.

tosi3k avatar Jun 20 '22 09:06 tosi3k

@tosi3k: Reopened this issue.

In response to this:

/reopen

This is still not fixed and one can hit K8s control plane availability problems when there's a high churn on large services in the cluster and lots of ingress-nginx-controller replicas - apiserver needs to send notifications about endpoints changes to lots of watchers which often ends up with its overload.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jun 20 '22 09:06 k8s-ci-robot

The lack of EndpointSlices implementation is unfortunately impacting production for us now. Since Kubernetes v1.22, Services that exceed 1000 Pods/network endpoints, Endpoints are now being truncated to a maximum of 1000 items.

ottoyiu avatar Jun 29 '22 17:06 ottoyiu

/priority backlog /triage accepted /project Stabilization Project

strongjz avatar Jul 07 '22 15:07 strongjz

@strongjz: You must be a member of the kubernetes/ingress-nginx github team to set the project and column.

In response to this:

/priority backlog /triage accepted /project Stabilization Project

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jul 07 '22 15:07 k8s-ci-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Aug 06 '22 16:08 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 06 '22 16:08 k8s-ci-robot

/reopen

tosi3k avatar Aug 08 '22 07:08 tosi3k

@tosi3k: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 08 '22 07:08 k8s-ci-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Sep 07 '22 07:09 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Sep 07 '22 07:09 k8s-ci-robot

/reopen /remove-lifecycle rotten /lifecycle frozen

tosi3k avatar Sep 07 '22 08:09 tosi3k

@tosi3k: Reopened this issue.

In response to this:

/reopen /remove-lifecycle rotten /lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Sep 07 '22 08:09 k8s-ci-robot

https://github.com/kubernetes/ingress-nginx/pull/8890 is currently working on this feature

/lifecycle frozen

strongjz avatar Sep 07 '22 13:09 strongjz