Canary ingress makes service it uses unusable for other ingresses
We still observe this issue with the following ingress-nginx version:
bash-5.1$ /nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: v0.47.0
Build: 7201e37633485d1f14dbe9cd7b22dd380df00a07
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.20.1
Steps to reproduce:
-
create 2 services:
--- apiVersion: v1 kind: Service metadata: name: webclient-prod1 namespace: prod spec: ports: - name: http port: 80 protocol: TCP targetPort: http selector: app.kubernetes.io/name: webclient-prod1 --- apiVersion: v1 kind: Service metadata: name: webclient-prod2 namespace: prod spec: ports: - name: http port: 80 protocol: TCP targetPort: http selector: app.kubernetes.io/name: webclient-prod2 -
create 2 ingresses pointing to each of the services from step 1:
--- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: webclient-prod1 namespace: prod spec: rules: - host: webclient.prod1.domain.com http: paths: - backend: service: name: webclient-prod1 port: number: 80 path: / pathType: ImplementationSpecific tls: - hosts: - webclient.prod1.domain.com secretName: webclient-prod1-tls --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: webclient-prod2 namespace: prod spec: rules: - host: webclient.prod2.domain.com http: paths: - backend: service: name: webclient-prod2 port: number: 80 path: / pathType: ImplementationSpecific tls: - hosts: - webclient.prod2.domain.com secretName: webclient-prod2-tls -
create 2 more ingresses which should serve as virtual IP domain, pointing to each of the services from step 1, one normal and one with
nginx.ingress.kubernetes.io/canary: "true"annotation:--- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: webclient-prod1-vip namespace: prod spec: rules: - host: webclient.domain.com http: paths: - backend: service: name: webclient-prod1 port: number: 80 path: / pathType: ImplementationSpecific tls: - hosts: - webclient.domain.com secretName: webclient-prod1-vip-tls --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: annotations: nginx.ingress.kubernetes.io/canary: "true" nginx.ingress.kubernetes.io/canary-weight: "0" name: webclient-prod2-vip namespace: prod spec: rules: - host: webclient.domain.com http: paths: - backend: service: name: webclient-prod2 port: number: 80 path: / pathType: ImplementationSpecific tls: - hosts: - webclient.domain.com secretName: webclient-prod2-vip-tls
The idea is to always expose service webclient-prod1 on its own subdomain webclient.prod1.domain.com and service webclient-prod2 on subdomain webclient.prod2.domain.com. The VIP subdomain webclient.domain.com should be normally pointed to service webclient-prod1 but can be in a canary manner (via changing nginx.ingress.kubernetes.io/canary-weight annotation value) be partially or fully switched to service webclient-prod2.
Expected behaviour: both webclient.prod1.domain.com and webclient.prod2.domain.com subdomains always work and connected to the corresponding services. webclient.domain.com also works and is pointed to one of the services (or traffic is split between them as configured in nginx.ingress.kubernetes.io/canary-weight annotation value).
Actual behaviour: webclient.prod2.domain.com subdomain is always pointed to default backend service (404), the rest works as expected.
From my understanding, this problem is caused by the bug discussed here and the root cause is that upstream for canary ingress is not created and marked with "noServer": true flag (see below) even if there is another ingress which uses the same service.
Output of curl localhost:10246/configuration/backends command:
[
{
"name": "prod-webclient-prod1-80",
"service": {
"metadata": {
"creationTimestamp": null
},
"spec": {
"ports": [
{
"name": "http",
"protocol": "TCP",
"port": 80,
"targetPort": "http"
}
],
"selector": {
"app.kubernetes.io/name": "webclient-prod1"
},
"clusterIP": "172.20.81.176",
"clusterIPs": [
"172.20.81.176"
],
"type": "ClusterIP",
"sessionAffinity": "None"
},
"status": {
"loadBalancer": {}
}
},
"port": 80,
"sslPassthrough": false,
"endpoints": [
{
"address": "10.8.23.179",
"port": "8080"
},
{
"address": "10.8.3.38",
"port": "8080"
}
],
"sessionAffinityConfig": {
"name": "",
"mode": "",
"cookieSessionAffinity": {
"name": ""
}
},
"upstreamHashByConfig": {
"upstream-hash-by-subset-size": 3
},
"noServer": false,
"trafficShapingPolicy": {
"weight": 0,
"header": "",
"headerValue": "",
"headerPattern": "",
"cookie": ""
},
"alternativeBackends": [
"prod-webclient-prod2-80"
]
},
{
"name": "prod-webclient-prod2-80",
"service": {
"metadata": {
"creationTimestamp": null
},
"spec": {
"ports": [
{
"name": "http",
"protocol": "TCP",
"port": 80,
"targetPort": "http"
}
],
"selector": {
"app.kubernetes.io/name": "webclient-prod2"
},
"clusterIP": "172.20.224.174",
"clusterIPs": [
"172.20.224.174"
],
"type": "ClusterIP",
"sessionAffinity": "None"
},
"status": {
"loadBalancer": {}
}
},
"port": 80,
"sslPassthrough": false,
"endpoints": [
{
"address": "10.8.29.221",
"port": "8080"
},
{
"address": "10.8.6.172",
"port": "8080"
}
],
"sessionAffinityConfig": {
"name": "",
"mode": "",
"cookieSessionAffinity": {
"name": ""
}
},
"upstreamHashByConfig": {
"upstream-hash-by-subset-size": 3
},
"noServer": true,
"trafficShapingPolicy": {
"weight": 0,
"header": "",
"headerValue": "",
"headerPattern": "",
"cookie": ""
}
},
{
"name": "upstream-default-backend",
"service": {
"metadata": {
"creationTimestamp": null
},
"spec": {
"ports": [
{
"name": "http",
"protocol": "TCP",
"port": 80,
"targetPort": "http"
}
],
"selector": {
"app.kubernetes.io/component": "default-backend",
"app.kubernetes.io/instance": "ingress-nginx-public",
"app.kubernetes.io/name": "ingress-nginx"
},
"clusterIP": "172.20.177.223",
"clusterIPs": [
"172.20.177.223"
],
"type": "ClusterIP",
"sessionAffinity": "None"
},
"status": {
"loadBalancer": {}
}
},
"port": 0,
"sslPassthrough": false,
"endpoints": [
{
"address": "10.8.12.58",
"port": "8080"
},
{
"address": "10.8.19.96",
"port": "8080"
}
],
"sessionAffinityConfig": {
"name": "",
"mode": "",
"cookieSessionAffinity": {
"name": ""
}
},
"upstreamHashByConfig": {},
"noServer": false,
"trafficShapingPolicy": {
"weight": 0,
"header": "",
"headerValue": "",
"headerPattern": "",
"cookie": ""
}
}
]
One can be redirected to default backend for more than one reason.
Please try Controller version 0.50.0 and update with the following info ;
- kubectl get all,ing -A -o wide
- kubectl -n prod describe po webclient-prod1
- kubectl -n prod describe po webclient-prod2
- kubectl -n prod describe svc webclient-prod1
- kubectl -n prod describe svc webclient-prod2
- kubectl -n prod describe ing webclient-prod1
- kubectl -n prod describe ing webclient-prod2
- kubectl -n prod describe ing webclient-prod1-vip
- kubectl -n prod describe ing webclient-prod2-vip
- Kubectl -n ingresscontrollernamespace logs ingresscontrollerpodname
- Your complete and exact curl command as executed during test and the complete response to curl in verbose mode
@longwuyuan I'm sorry but your request is not relevant. I provided output of curl localhost:10246/configuration/backends command which clearly shows that there is no upstream created for webclient-prod2 service ("noServer": true).
This issue https://github.com/kubernetes/ingress-nginx/issues/4667 implies that the backend will be common to 2 ingress objects but based on the field ingress.spec.rules.hosts being different, the upstream configured in nginx.conf should be different. I will be happy to be proved wrong but I think the value of the field ingress.spec.rules.http.paths.backend.service is used to configure upstream. So its not clear as to which field(s) will provide 2 different values to configure 2 different upstream.
Hope some expert comments and makes progress for you.
On a different note, is this a real live use case ?
Yes, this is setup we wanted to use for our PROD environment but unfortunately we couldn't, since canary ingress makes service it is pointed to unusable for any other ingress. We eventually solved it via creating 2 additional services which leads to more work to keep all services in sync.
Can you point me to some documentation related to this concept of "The VIP subdomain". First time I see "VIP" used in Kubernetes object of kind service. Its of-course a popular use case in non-Kubernetes infrastructures. So want to read about it.
I believed that 'trafficShapingPolicy' in the backend info via curl localhost:10246/configuration/backends is used for the canary behavior.
But the same backend will share between ingress objects point to one service, it can be proved by its naming convention
I've met this problem in our customer's production cluster for times, and it seems to be a frequently encountered problem.
I think you should add your nginx.conf of the controller pod here, just for clarity.
@longwuyuan VIP is just a name for (sub)domain, we can name it "main" instead. The idea is to switch (sub)domain from one service to another in blue/green manner (similar to how in the past people used to move IP from one VM to another).
Please ignore the naming (it might be not the best) and concentrate on the bug instead...
Let me give another example:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: base
spec:
rules:
- host: test.example.com
http:
paths:
- pathType: Prefix
path: /
backend:
service:
name: svc-prod
port:
number: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: base1
spec:
rules:
- host: test1.example.com
http:
paths:
- pathType: Prefix
path: /
backend:
service:
name: svc-prod
port:
number: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: canary
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
nginx.ingress.kubernetes.io/canary-by-header-value: "canary"
spec:
rules:
- host: test.example.com
http:
paths:
- pathType: Prefix
path: /
backend:
service:
name: svc-canary
port:
number: 80
we use svc-prod service in base and base1 ingress for test.example.com and test1.example.com.
and then make a canary ingress f or test.example.com pointing to svc-canary.
we expected that we could access svc-prod from test.example.com without canary headers (X-Canary: canary) and test1.example.com with any headers.
but the fact is, when you access test1.example.com with a canary header, the traffic will be redirected to svc-canary, which are out of our expectation.
But the situation above seems kinda different from mine, and I'm not sure if they're related now :(
Just trying to know what problem you want to solve. Looks like you want blue/green and Canary working together.
Thanks, ; Long Wu Yuan
On 12/2/21 6:17 PM, Dmytro Zavalkin wrote:
@longwuyuan https://github.com/longwuyuan VIP is just a name for (sub)domain, we can name it "main" instead. The idea is to switch (sub)domain from one service to another in blue/green manner (similar to how in the past people used to move IP from one VM to another).
Please ignore the naming (it might be not the best) and concentrate on the bug instead...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/ingress-nginx/issues/8004#issuecomment-984596598, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGZVWV5WEJ4TLO7SWYHGKLUO5TGFANCNFSM5JE5UHWA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Not really, basically we want to have both services webclient-prod1 and webclient-prod2 to be always exposed on their own subdomains (ingresses) webclient.prod1.domain.com and webclient.prod2.domain.com and have another main/VIP/you name it subdomain (ingress) webclient.domain.com which we point to either webclient-prod1 service or webclient-prod2 service (or both with canary traffic split) as required by business logic/needs.
Unfortunately, because of the bug I described above - subdomain (ingress) webclient.prod2.domain.com is always pointed to the default backend service instead of webclient-prod2 service, because no upstream is created for webclient-prod2 service (it is marked with "noServer": true flag). I hope it makes more sense now.
I think I know what's going on for @Zyava's problem. I dig into the code and found NoServer means "skip generating server block in nginx.conf for this upstream". But it actually behaves "skip generating location block for this upstream" at https://github.com/kubernetes/ingress-nginx/blob/c0814c6f784e63f08768a935234afc201cf5a5f2/internal/ingress/controller/controller.go#L648-L650 It reasonable because canary service doesn't need to generate the rule in nginx configuration, or it will conflict with the main upstream, but when use a canary service from another ingress object as a main service, it won't generate the rule in nginx.conf.
And it's clear that is not the same problem as mine, but it has a common issue that canary should be associated to upstream + host + path, not only upstream.
Wondering if you are requesting a new feature
@longwuyuan I wonder if it's by design that canary will take effects in upstream scope, lead to the canary rule can be applied to all ingress objects with the same service (which may not meet our expectation), and in the meantime, a canary service can't be used by another ingress as a primary upstream, cause the controller will not generate rules for it in nginx.conf
We've encountered the problem for times in real production environment. And it's also a good idea to implement it as a new feature, in my opinion.
There's a great solution to the problem #4716 , and I wonder if we can reopen it, or create a new PR based on this work.
I don't understand how this can be a new feature and not a bug. Service used in a canary ingress can't be used in any other ingress - how come it can be considered as normal? Obviously, this "feature" (canary ingress implications) is currently not document anywhere...
We can re-apply the bug label when the triaging is completed. Basic canary functionality seems to be working. @theunrealgeek any comments on this
/assign
/triage accepted /priority backlog
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
I think Canary functionality is incomplete without the requested update. A lot of times it is desired to specifically route to canary service (webclient-prod2) so that one can test canary service, pre-prod validations etc. Canary weight at this point would be '0'. Once initial testing is complete we can start using canary features by increasing canary weight. Even if there is a traffic shift enabled , we still would like to continue testing in canary.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@Zyava hello! did you find any workarounds for this ?
Nope, unfortunately I don’t know how to workaround this without switching to another ingress controller…Best regards,Dmytro On 14 Nov 2023, at 15:21, Yannick Stevenin @.***> wrote: @Zyava hello! did you find any workarounds for this ?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>