rollouts-plugin-trafficrouter-gatewayapi
rollouts-plugin-trafficrouter-gatewayapi copied to clipboard
Canary support with argo rolll outs with kong traffic provider does not work seamlessly
Checklist:
Argo roll out image version : v1.7.0 Kong Version: 3.5 Argo roll out app version: 2.37.2
Describe the bug
<Tried Canary support with argo roll outs with kong traffic provider everything works and rollout happens BUT after some time (not fixed). other APIs which are routed via same gateway start failing where as there are no changes made to those http routes>
To Reproduce
- Install argo roll out
- Install kong ingress controller
- Followed this doc (https://github.com/argoproj-labs/rollouts-plugin-trafficrouter-gatewayapi/tree/main/examples/kong)
- Everything works as expected
- Existing API which shares same gateway start giving 5xx error where as no changes made to that object starts failing
- Deleting http object of sample app created using above documentation will fix the issue
Expected behavior
All routes should without any issues
Screenshots
We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.
Version
Argo roll out image version : v1.7.0 Argo roll out app version: 2.37.2
Logs
No specific errors in roll out controller when this issue happend
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
Hi, you installed argo rollouts and it works, but other APIs using the same Gateway stopped to work and other APIs also uses HTTPRoute resources ? Can you say HTTPRoutes configurations have intersecations in rules, please ? It seems it is related with matching precedence https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io/v1.HTTPRouteRule Did you check it ?
@som-kanade-zepto did you solve the problem ? If yes, can you share your solution, please ?
No there is no intersection between API Paths of 2 services that are failing
@som-kanade-zepto did you check the logs of Kong controller and argo rollouts controller ? Did you see any errors or warnings there ?
I was able to only see 5xx errors in kong nothing related to this with timeouts and very generic 5xx errors nothing related to this
Is it similar to create these 2 HTTPRoutes
kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1beta1
metadata:
name: test-httproute
annotations:
konghq.com/strip-path: 'true'
spec:
parentRefs:
- kind: Gateway
name: kong
rules:
- matches:
- path:
type: PathPrefix
value: /api/v1/cart
backendRefs:
- name: argo-rollouts-stable-service
kind: Service
port: 80
weight: 50
- name: argo-rollouts-canary-service
kind: Service
port: 80
weight: 50
---
kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1beta1
metadata:
name: argo-rollouts-test-httproute
annotations:
konghq.com/strip-path: 'true'
spec:
parentRefs:
- kind: Gateway
name: kong
rules:
- matches:
- path:
type: PathPrefix
value: /test
backendRefs:
- name: argo-rollouts-stable-service
kind: Service
port: 80
- name: argo-rollouts-canary-service
kind: Service
port: 80
---
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: rollouts-demo
namespace: default
spec:
revisionHistoryLimit: 1
replicas: 3
strategy:
canary:
canaryService: argo-rollouts-canary-service # our created canary service
stableService: argo-rollouts-stable-service # our created stable service
trafficRouting:
plugins:
argoproj-labs/gatewayAPI:
httpRoute: argo-rollouts-test-httproute # our created httproute
namespace: default
steps:
- setWeight: 30
- pause: {}
- setWeight: 60
- pause: {}
- setWeight: 100
- pause: {}
revisionHistoryLimit: 2
selector:
matchLabels:
app: rollouts-demo
template:
metadata:
labels:
app: rollouts-demo
spec:
containers:
- name: rollouts-demo
image: kostiscodefresh/summer-of-k8s-app:v2 # change to v2 for next version
ports:
- name: http
containerPort: 8080
protocol: TCP
resources:
requests:
memory: 32Mi
cpu: 5m
I created and it seems both routes work: one that work using argo rollouts and another without argo rollouts. Both uses the same gateway
Ah, I am using Kong of 3.6 version. I will try to downgrade and check will it work or not
Yes @Philipp-Plotnikov It is similar routes like you mentioned Can you please deploy something on one of the backend route which is hosting /api/v1/cart and run api test
@som-kanade-zepto all works, but it needs to add / at ther end of the path so it works for /test/ not /test /api/v1/cart/ not /api/v1/cart
I follow this instruction to install kong https://docs.konghq.com/kubernetes-ingress-controller/latest/install/helm/ After it needs to install argo rollouts with plugin and routes
@som-kanade-zepto did you follow these instructions to install kong ? If not, can you try it, please ?
Yes @Philipp-Plotnikov We did install
@som-kanade-zepto as quick check can you upgrade kong and check will it work, please ? Maybe it was a bug in kong(not sure). As for now I dont have any suggestions, what is the reason of fails. For me it works. I created 2 HTTPRoutes, /test (for argo rollouts), /api/v1/cart (as seperate HTTPRoute). Both HTTPRoutes point to the same pods that were created with argo rollouts (kostiscodefresh/summer-of-k8s-app:v2). But I also created one more HTTPRoute (/single/test) that points to the different pods and it works too. All 3 HTTPRoutes work. If it wont help, will continue to think.
Yes @Philipp-Plotnikov We did install
Are you sure you installed following the instructions in the link higher ? As in README the links point to the old doc version.
@som-kanade-zepto old API stopped to work when argo rollouts finishes canary ? Can it be so situation that old routes point to the old pods that were deleted during canary with rollout ? I started to get 5xx error codes when HTTPRoute(not connected with rollout) continue to point to the old pods but rollout already replaces them with new version.
There was no rollout happend on the rollouts which were failing
@som-kanade-zepto have you tried to upgrade kong ?
Yes we did try upgrading to kong 3.6 with no luck
@som-kanade-zepto can you try new v0.4.0 release to check the same problem is or is not, please ?
https://github.com/argoproj-labs/rollouts-plugin-trafficrouter-gatewayapi/releases/tag/v0.4.0
thanks @Philipp-Plotnikov will try and let you know about the findings
Hi @Philipp-Plotnikov I tested again with the same rollout.yaml file and noticed that Kong works fine during the rollout promotion. However, when I promote to the new revision and the weight updates to a 100:0 ratio between the stable and canary services, errors start appearing. If I promote the rollout by at least one step, the error disappears.
Even if I remove the trafficRouting section completely from rollout.yaml and manually update the HTTP route object weight, the behavior remains the same.
Error in Kong logs:
error Failed parsing resource errors {"url": "https://172.16.118.154:8444", "update_strategy": "InMemory", "error": "could not unmarshal config error: json: cannot unmarshal object into Go struct field ConfigError.flattened_errors of type []sendconfig.FlatEntityError"}
No error in argo-rollouts.
To verify Kong's functionality in case of multiple backend refs, I created two separate deployments and their respective services, not managed by Argo Rollouts. In this setup, Kong correctly routes traffic to the different endpoints without any issues.
Argo Rollouts version: 1.7.2 EKS version: 1.3.0
Any suggestions on how to troubleshoot this error?
I am also facing the same issue, any luck