ingress-nginx
ingress-nginx copied to clipboard
Canary ingress makes service it uses unusable for other ingresses
We still observe this issue with the following ingress-nginx
version:
bash-5.1$ /nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: v0.47.0
Build: 7201e37633485d1f14dbe9cd7b22dd380df00a07
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.20.1
Steps to reproduce:
-
create 2 services:
--- apiVersion: v1 kind: Service metadata: name: webclient-prod1 namespace: prod spec: ports: - name: http port: 80 protocol: TCP targetPort: http selector: app.kubernetes.io/name: webclient-prod1 --- apiVersion: v1 kind: Service metadata: name: webclient-prod2 namespace: prod spec: ports: - name: http port: 80 protocol: TCP targetPort: http selector: app.kubernetes.io/name: webclient-prod2
-
create 2 ingresses pointing to each of the services from step 1:
--- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: webclient-prod1 namespace: prod spec: rules: - host: webclient.prod1.domain.com http: paths: - backend: service: name: webclient-prod1 port: number: 80 path: / pathType: ImplementationSpecific tls: - hosts: - webclient.prod1.domain.com secretName: webclient-prod1-tls --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: webclient-prod2 namespace: prod spec: rules: - host: webclient.prod2.domain.com http: paths: - backend: service: name: webclient-prod2 port: number: 80 path: / pathType: ImplementationSpecific tls: - hosts: - webclient.prod2.domain.com secretName: webclient-prod2-tls
-
create 2 more ingresses which should serve as virtual IP domain, pointing to each of the services from step 1, one normal and one with
nginx.ingress.kubernetes.io/canary: "true"
annotation:--- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: webclient-prod1-vip namespace: prod spec: rules: - host: webclient.domain.com http: paths: - backend: service: name: webclient-prod1 port: number: 80 path: / pathType: ImplementationSpecific tls: - hosts: - webclient.domain.com secretName: webclient-prod1-vip-tls --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: annotations: nginx.ingress.kubernetes.io/canary: "true" nginx.ingress.kubernetes.io/canary-weight: "0" name: webclient-prod2-vip namespace: prod spec: rules: - host: webclient.domain.com http: paths: - backend: service: name: webclient-prod2 port: number: 80 path: / pathType: ImplementationSpecific tls: - hosts: - webclient.domain.com secretName: webclient-prod2-vip-tls
The idea is to always expose service webclient-prod1
on its own subdomain webclient.prod1.domain.com
and service webclient-prod2
on subdomain webclient.prod2.domain.com
. The VIP subdomain webclient.domain.com
should be normally pointed to service webclient-prod1
but can be in a canary manner (via changing nginx.ingress.kubernetes.io/canary-weight
annotation value) be partially or fully switched to service webclient-prod2
.
Expected behaviour: both webclient.prod1.domain.com
and webclient.prod2.domain.com
subdomains always work and connected to the corresponding services. webclient.domain.com
also works and is pointed to one of the services (or traffic is split between them as configured in nginx.ingress.kubernetes.io/canary-weight
annotation value).
Actual behaviour: webclient.prod2.domain.com
subdomain is always pointed to default backend service (404), the rest works as expected.
From my understanding, this problem is caused by the bug discussed here and the root cause is that upstream for canary ingress is not created and marked with "noServer": true
flag (see below) even if there is another ingress which uses the same service.
Output of curl localhost:10246/configuration/backends
command:
[
{
"name": "prod-webclient-prod1-80",
"service": {
"metadata": {
"creationTimestamp": null
},
"spec": {
"ports": [
{
"name": "http",
"protocol": "TCP",
"port": 80,
"targetPort": "http"
}
],
"selector": {
"app.kubernetes.io/name": "webclient-prod1"
},
"clusterIP": "172.20.81.176",
"clusterIPs": [
"172.20.81.176"
],
"type": "ClusterIP",
"sessionAffinity": "None"
},
"status": {
"loadBalancer": {}
}
},
"port": 80,
"sslPassthrough": false,
"endpoints": [
{
"address": "10.8.23.179",
"port": "8080"
},
{
"address": "10.8.3.38",
"port": "8080"
}
],
"sessionAffinityConfig": {
"name": "",
"mode": "",
"cookieSessionAffinity": {
"name": ""
}
},
"upstreamHashByConfig": {
"upstream-hash-by-subset-size": 3
},
"noServer": false,
"trafficShapingPolicy": {
"weight": 0,
"header": "",
"headerValue": "",
"headerPattern": "",
"cookie": ""
},
"alternativeBackends": [
"prod-webclient-prod2-80"
]
},
{
"name": "prod-webclient-prod2-80",
"service": {
"metadata": {
"creationTimestamp": null
},
"spec": {
"ports": [
{
"name": "http",
"protocol": "TCP",
"port": 80,
"targetPort": "http"
}
],
"selector": {
"app.kubernetes.io/name": "webclient-prod2"
},
"clusterIP": "172.20.224.174",
"clusterIPs": [
"172.20.224.174"
],
"type": "ClusterIP",
"sessionAffinity": "None"
},
"status": {
"loadBalancer": {}
}
},
"port": 80,
"sslPassthrough": false,
"endpoints": [
{
"address": "10.8.29.221",
"port": "8080"
},
{
"address": "10.8.6.172",
"port": "8080"
}
],
"sessionAffinityConfig": {
"name": "",
"mode": "",
"cookieSessionAffinity": {
"name": ""
}
},
"upstreamHashByConfig": {
"upstream-hash-by-subset-size": 3
},
"noServer": true,
"trafficShapingPolicy": {
"weight": 0,
"header": "",
"headerValue": "",
"headerPattern": "",
"cookie": ""
}
},
{
"name": "upstream-default-backend",
"service": {
"metadata": {
"creationTimestamp": null
},
"spec": {
"ports": [
{
"name": "http",
"protocol": "TCP",
"port": 80,
"targetPort": "http"
}
],
"selector": {
"app.kubernetes.io/component": "default-backend",
"app.kubernetes.io/instance": "ingress-nginx-public",
"app.kubernetes.io/name": "ingress-nginx"
},
"clusterIP": "172.20.177.223",
"clusterIPs": [
"172.20.177.223"
],
"type": "ClusterIP",
"sessionAffinity": "None"
},
"status": {
"loadBalancer": {}
}
},
"port": 0,
"sslPassthrough": false,
"endpoints": [
{
"address": "10.8.12.58",
"port": "8080"
},
{
"address": "10.8.19.96",
"port": "8080"
}
],
"sessionAffinityConfig": {
"name": "",
"mode": "",
"cookieSessionAffinity": {
"name": ""
}
},
"upstreamHashByConfig": {},
"noServer": false,
"trafficShapingPolicy": {
"weight": 0,
"header": "",
"headerValue": "",
"headerPattern": "",
"cookie": ""
}
}
]
One can be redirected to default backend for more than one reason.
Please try Controller version 0.50.0 and update with the following info ;
- kubectl get all,ing -A -o wide
- kubectl -n prod describe po webclient-prod1
- kubectl -n prod describe po webclient-prod2
- kubectl -n prod describe svc webclient-prod1
- kubectl -n prod describe svc webclient-prod2
- kubectl -n prod describe ing webclient-prod1
- kubectl -n prod describe ing webclient-prod2
- kubectl -n prod describe ing webclient-prod1-vip
- kubectl -n prod describe ing webclient-prod2-vip
- Kubectl -n ingresscontrollernamespace logs ingresscontrollerpodname
- Your complete and exact curl command as executed during test and the complete response to curl in verbose mode
@longwuyuan I'm sorry but your request is not relevant. I provided output of curl localhost:10246/configuration/backends
command which clearly shows that there is no upstream created for webclient-prod2
service ("noServer": true
).
This issue https://github.com/kubernetes/ingress-nginx/issues/4667 implies that the backend will be common to 2 ingress objects but based on the field ingress.spec.rules.hosts
being different, the upstream
configured in nginx.conf should be different. I will be happy to be proved wrong but I think the value of the field ingress.spec.rules.http.paths.backend.service
is used to configure upstream. So its not clear as to which field(s) will provide 2 different values to configure 2 different upstream.
Hope some expert comments and makes progress for you.
On a different note, is this a real live use case ?
Yes, this is setup we wanted to use for our PROD environment but unfortunately we couldn't, since canary ingress makes service it is pointed to unusable for any other ingress. We eventually solved it via creating 2 additional services which leads to more work to keep all services in sync.
Can you point me to some documentation related to this concept of "The VIP subdomain". First time I see "VIP" used in Kubernetes object of kind service. Its of-course a popular use case in non-Kubernetes infrastructures. So want to read about it.
I believed that 'trafficShapingPolicy' in the backend info via curl localhost:10246/configuration/backends
is used for the canary behavior.
But the same backend will share between ingress objects point to one service, it can be proved by its naming convention
I've met this problem in our customer's production cluster for times, and it seems to be a frequently encountered problem.
I think you should add your nginx.conf of the controller pod here, just for clarity.
@longwuyuan VIP is just a name for (sub)domain, we can name it "main" instead. The idea is to switch (sub)domain from one service to another in blue/green manner (similar to how in the past people used to move IP from one VM to another).
Please ignore the naming (it might be not the best) and concentrate on the bug instead...
Let me give another example:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: base
spec:
rules:
- host: test.example.com
http:
paths:
- pathType: Prefix
path: /
backend:
service:
name: svc-prod
port:
number: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: base1
spec:
rules:
- host: test1.example.com
http:
paths:
- pathType: Prefix
path: /
backend:
service:
name: svc-prod
port:
number: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: canary
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
nginx.ingress.kubernetes.io/canary-by-header-value: "canary"
spec:
rules:
- host: test.example.com
http:
paths:
- pathType: Prefix
path: /
backend:
service:
name: svc-canary
port:
number: 80
we use svc-prod
service in base
and base1
ingress for test.example.com
and test1.example.com
.
and then make a canary ingress f or test.example.com
pointing to svc-canary
.
we expected that we could access svc-prod
from test.example.com
without canary headers (X-Canary: canary
) and test1.example.com
with any headers.
but the fact is, when you access test1.example.com
with a canary header, the traffic will be redirected to svc-canary
, which are out of our expectation.
But the situation above seems kinda different from mine, and I'm not sure if they're related now :(
Just trying to know what problem you want to solve. Looks like you want blue/green and Canary working together.
Thanks, ; Long Wu Yuan
On 12/2/21 6:17 PM, Dmytro Zavalkin wrote:
@longwuyuan https://github.com/longwuyuan VIP is just a name for (sub)domain, we can name it "main" instead. The idea is to switch (sub)domain from one service to another in blue/green manner (similar to how in the past people used to move IP from one VM to another).
Please ignore the naming (it might be not the best) and concentrate on the bug instead...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/ingress-nginx/issues/8004#issuecomment-984596598, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGZVWV5WEJ4TLO7SWYHGKLUO5TGFANCNFSM5JE5UHWA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Not really, basically we want to have both services webclient-prod1
and webclient-prod2
to be always exposed on their own subdomains (ingresses) webclient.prod1.domain.com
and webclient.prod2.domain.com
and have another main/VIP/you name it subdomain (ingress) webclient.domain.com
which we point to either webclient-prod1
service or webclient-prod2
service (or both with canary traffic split) as required by business logic/needs.
Unfortunately, because of the bug I described above - subdomain (ingress) webclient.prod2.domain.com
is always pointed to the default backend service instead of webclient-prod2
service, because no upstream is created for webclient-prod2
service (it is marked with "noServer": true flag). I hope it makes more sense now.
I think I know what's going on for @Zyava's problem. I dig into the code and found NoServer means "skip generating server block in nginx.conf for this upstream". But it actually behaves "skip generating location block for this upstream" at https://github.com/kubernetes/ingress-nginx/blob/c0814c6f784e63f08768a935234afc201cf5a5f2/internal/ingress/controller/controller.go#L648-L650 It reasonable because canary service doesn't need to generate the rule in nginx configuration, or it will conflict with the main upstream, but when use a canary service from another ingress object as a main service, it won't generate the rule in nginx.conf.
And it's clear that is not the same problem as mine, but it has a common issue that canary should be associated to upstream + host + path, not only upstream.
Wondering if you are requesting a new feature
@longwuyuan I wonder if it's by design that canary will take effects in upstream scope, lead to the canary rule can be applied to all ingress objects with the same service (which may not meet our expectation), and in the meantime, a canary service can't be used by another ingress as a primary upstream, cause the controller will not generate rules for it in nginx.conf
We've encountered the problem for times in real production environment. And it's also a good idea to implement it as a new feature, in my opinion.
There's a great solution to the problem #4716 , and I wonder if we can reopen it, or create a new PR based on this work.
I don't understand how this can be a new feature and not a bug. Service used in a canary ingress can't be used in any other ingress - how come it can be considered as normal? Obviously, this "feature" (canary ingress implications) is currently not document anywhere...
We can re-apply the bug label when the triaging is completed. Basic canary functionality seems to be working. @theunrealgeek any comments on this
/assign
/triage accepted /priority backlog
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
I think Canary functionality is incomplete without the requested update. A lot of times it is desired to specifically route to canary service (webclient-prod2) so that one can test canary service, pre-prod validations etc. Canary weight at this point would be '0'. Once initial testing is complete we can start using canary features by increasing canary weight. Even if there is a traffic shift enabled , we still would like to continue testing in canary.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@Zyava hello! did you find any workarounds for this ?
Nope, unfortunately I don’t know how to workaround this without switching to another ingress controller…Best regards,Dmytro On 14 Nov 2023, at 15:21, Yannick Stevenin @.***> wrote: @Zyava hello! did you find any workarounds for this ?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>