ingress-nginx
ingress-nginx copied to clipboard
Support Kubernetes EndpointSlices
Support Kubernetes EndpointSlices. A newer feature in Kubernetes that allows restricting or customizing where traffic is sent in a Kubernetes cluster.
Background:
https://stackoverflow.com/questions/63399080/kubernetes-1-18-6-servicetopology-and-ingress-support
Not that I know of
K8s 1.17 and above (Beta): https://kubernetes.io/docs/concepts/services-networking/endpoint-slices
/kind feature
@raravena80 what are you trying to do exactly?
K8s 1.17 and above (Beta): https://kubernetes.io/docs/concepts/services-networking/endpoint-slices
Right, but for some context, the majority of the users are still running k8s < 1.16, even 1.13. Adding a feature like this one only adds complexity to the project.
Without a clear problem this feature could solve, I don't see the reason to add support, at least until users run k8s > 1.17
@aledbf this is based on this Stackoverflow question: https://stackoverflow.com/questions/63399080/kubernetes-1-18-6-servicetopology-and-ingress-support
Thanks!
this is based on this Stackoverflow question: https://stackoverflow.com/questions/63399080/kubernetes-1-18-6-servicetopology-and-ingress-support
Interesting.
The question itself, about service topology, can be solved using the annotation service-upstream That said, the source of the connection will be ingress-nginx, delegating the topology part to the k8s service topology feature (topologyKeys). But then you cannot have custom LB algorithms or sticky sessions.
The EndpointSlices part makes sense when you have services will lot of endpoints (> 100).
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
Endpoints slices are game changer not only for the scalability benefits they bring for services with a lot of endpoints, they also bring performance improvements and cost savings in cloud environment like aws.
It is possible to group endpoints per availability zone a based on the identity of the nginx pod you can prefer the endpoints in your zone instead of the others across zone. This saves you money and boost perfs because of the traffic staying in the same avz.
this is based on this Stackoverflow question: https://stackoverflow.com/questions/63399080/kubernetes-1-18-6-servicetopology-and-ingress-support
Interesting.
The question itself, about service topology, can be solved using the annotation service-upstream That said, the source of the connection will be ingress-nginx, delegating the topology part to the k8s service topology feature (topologyKeys). But then you cannot have custom LB algorithms or sticky sessions.
The EndpointSlices part makes sense when you have services will lot of endpoints (> 100).
Please correct me @aledbf, but I believe it would make sense to consider endpoint slices and topology aware routing in this project as well. K8s services are kind of difficult to use in a HTTP/2 context eg. when using gRPC due to it's connection reuse/multiplexing. There is the possibility to use a headless service and DNS based client load balancing but this also comes with some issues eg. getting notice of new pods (can be possibility by lower the TTL). The only clean solution here is working on endpoints directly. On the client side, there is a project for this https://github.com/sercand/kuberesolver, even though it does not have support for endpoint slices and topology aware routing yet.
So if we wan't to have a topology aware routing (which does make sense for many cases, especially cost reduction in a multi AZ environment) for HTTP/2 we might need to include some logic working on endpoint slices and certain routing preferences.
See also: https://github.com/zalando/skipper/issues/1446 https://github.com/linkerd/linkerd2/pull/4780
So if we wan't to have a topology-aware routing (which does make sense for many cases, especially cost reduction in a multi AZ environment) for HTTP/2 we might need to include some logic working on endpoint slices and certain routing preferences.
We have a KEP to add support for zone aware routing but such a feature requires massive changes in the lua side of the controller.
Using topology-aware routing (from k8s) means you lose several features from ingress-nginx, like sticky sessions, due to the use of the k8s service abstraction instead of endpoints.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
/remove-lifecycle stale.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten
/remove-lifecycle rotten.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-contributor-experience at kubernetes/community. /close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen
. Mark the issue as fresh with/remove-lifecycle rotten
.Send feedback to sig-contributor-experience at kubernetes/community. /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Let's not forget that Endpoints
resources are going to be deprecated very soon.
/reopen
This is still not fixed and one can hit K8s control plane availability problems when there's a high churn on large services in the cluster and lots of ingress-nginx-controller
replicas - apiserver needs to send notifications about endpoints
changes to lots of watchers which often ends up with its overload.
@tosi3k: Reopened this issue.
In response to this:
/reopen
This is still not fixed and one can hit K8s control plane availability problems when there's a high churn on large services in the cluster and lots of
ingress-nginx-controller
replicas - apiserver needs to send notifications aboutendpoints
changes to lots of watchers which often ends up with its overload.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
The lack of EndpointSlices implementation is unfortunately impacting production for us now. Since Kubernetes v1.22, Services that exceed 1000 Pods/network endpoints, Endpoints are now being truncated to a maximum of 1000 items.
/priority backlog /triage accepted /project Stabilization Project
@strongjz: You must be a member of the kubernetes/ingress-nginx github team to set the project and column.
In response to this:
/priority backlog /triage accepted /project Stabilization Project
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen
@tosi3k: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen /remove-lifecycle rotten /lifecycle frozen
@tosi3k: Reopened this issue.
In response to this:
/reopen /remove-lifecycle rotten /lifecycle frozen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
https://github.com/kubernetes/ingress-nginx/pull/8890 is currently working on this feature
/lifecycle frozen