contour
contour copied to clipboard
TCPRoute delete not propagated
What steps did you take and what happened: I create a Gateway with a TCP listener and confirm it's admitted. I create a TCPRoute and confirm it's accepted. An external call to the listener port works as expected.
I then delete the TCPRoute. Nothing gets logged on the contour gateway or on the envoy pod (compared as when the TCPRoute is created). An external call to the listener port still works.
I then update the Gateway to add a new listener. The Gateway gets reconciled. An external call to the listener port doesn't work anymore.
What did you expect to happen: The service shouldn't be mapped to the listener after route deletion.
Environment:
- Contour version: 1.28.2
- Kubernetes version: (use
kubectl version): 1.29.1 (same behavior on 1.28.2 before upgrade)
Hey @bhamon! Thanks for opening your first issue. We appreciate your contribution and welcome you to our community! We are glad to have you here and to have your input on Contour. You can also join us on our mailing list and in our channel in the Kubernetes Slack Workspace
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
- After 60d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the Issue is closed
You can:
- Mark this Issue as fresh by commenting
- Close this Issue
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack
Yep, I can repro this using Contour 1.28.2, however it's fixed as of 1.29.
Root cause:
- In 1.28.2, on a TCPRoute deletion, the TCPRoute controller passes a dummy TCPRoute to the handler to trigger a delete (https://github.com/projectcontour/contour/blob/v1.28.2/internal/controller/tcproute.go#L63-L68). Note that this dummy TCPRoute only has namespace and name populated.
- within the DAG cache, on a delete, we try to look at at the TCPRoute's parent refs to determine if it is a relevant resource (i.e. attached to the relevant Gateway) that needs to trigger a DAG rebuild (https://github.com/projectcontour/contour/blob/v1.28.2/internal/dag/cache.go#L406). However, these parent refs are never populated per the above bullet, so the DAG is never rebuilt on a delete.
In 1.29 we moved away from using controller-runtime controllers for Gateway API resources and instead directly use client-go informers (ref. https://github.com/projectcontour/contour/blob/v1.29.0/cmd/contour/serve.go#L1013-L1042), which give you the full state of the deleted object on deletes, so the DAG cache deletion logic works as intended.
We might consider backporting a fix for this (we'd likely just always trigger a DAG rebuild on any Gateway API resource deletion), however easiest path forward is to upgrade to 1.29.
Thanks for the report!
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
- After 60d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the Issue is closed
You can:
- Mark this Issue as fresh by commenting
- Close this Issue
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
- After 60d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the Issue is closed
You can:
- Mark this Issue as fresh by commenting
- Close this Issue
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack