external-dns icon indicating copy to clipboard operation
external-dns copied to clipboard

0.14.1 causing crash loopback related to v1.Gateway

Open PurseChicken opened this issue 1 year ago • 15 comments

What happened:

After updating to 0.14.1, the pod goes into a crash loop with the error:

failed to sync *v1.Gateway: context deadline exceeded

This does not happen with 0.14.0.

What you expected to happen:

For the pod to run without crashing.

How to reproduce it (as minimally and precisely as possible):

Updated to 0.14.1.

Anything else we need to know?:

I believe this is due to the use of a gateway-api source. Our source configuration looks at ingress and http-route resources.

  sources:
    - ingress
    - gateway-httproute

My assumption is that the changes to gateway-api in 0.14.1 are looking for v1 resources \ CRD's. That said, at least in GKE, gateway-api resources are still being deployed using v1beta1. The GKE documentation also currently references v1beta1 CRD's.

I imagine that external-dns changes to gateway-api need to support both v1beta1 as well as v1.

PurseChicken avatar Apr 05 '24 19:04 PurseChicken

I believe my assumption is true. I can see in "gateway-api: fix wildcard matching" #4124 v1beta1 was removed for v1.

PurseChicken avatar Apr 05 '24 19:04 PurseChicken

Experiencing the same issues with GKE Gateway API

jnauska avatar Apr 05 '24 21:04 jnauska

Same issue here I was getting nuts :D For now I will downgrade a chart version

Tarjei400 avatar Apr 06 '24 20:04 Tarjei400

Just in case using chart in version 6.38.0 and

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: HTTPRoute

seems to do a trick temporaily on GKE

Tarjei400 avatar Apr 06 '24 21:04 Tarjei400

After reviewing more carefully the changes in 0.14.1, I think this is the culprit https://github.com/kubernetes-sigs/external-dns/pull/4019

Basically, removes v1beta1 for Gateways, HTTPRoutes as they have been upgraded to V1 and bumps dependencies to use v1.0.0 of gateway-apis.

jnauska avatar Apr 22 '24 14:04 jnauska

I did the initial implementation for Gateway API in External DNS. I wasn't the one who migrated from v1beta1 to v1, but it seemed like a reasonable change to me... I'll take a look at what would be necessary to support both v1 and v1beta1.

abursavich avatar Apr 22 '24 15:04 abursavich

@abursavich Looks like everyone upgraded to v1, does anyone know when Google plans to upgrade to v1?, maybe supporting both CRD versions wont be needed if its planned sometime soon.

Tarjei400 avatar Apr 22 '24 16:04 Tarjei400

Technically (pedantically), the docs say:

As the Gateway API is still in an experimental phase, ExternalDNS makes no backwards compatibility guarantees regarding its support.

If you install a newer version of the CRDs then the resources will be auto-converted by the Kubernetes API server to the new versions, but I don't know if GKE will stomp the change.

There's a SIG-Network Gateway API meeting this afternoon, which I plan to join to get input on this. As an added benefit, there's usually someone from Google there that will probably care about the GKE implementation.

abursavich avatar Apr 22 '24 16:04 abursavich

The CRD Management guidelines seem to imply that if GKE rolled back the newer CRDs then it would be a bug in GKE:

Some implementations may also want to bundle CRDs to simplify installation. This is acceptable as long as they never:

  1. Overwrite Gateway API CRDs that have unrecognized or newer versions.
  2. Overwrite Gateway API CRDs that have a different release channel.
  3. Remove Gateway API CRDs.

abursavich avatar Apr 22 '24 16:04 abursavich

Thank you for your help @abursavich , much appreciated. Ping me if you need any type of support.

Raffo avatar Apr 23 '24 06:04 Raffo

Still an issue in 0.14.2

PurseChicken avatar Jun 12 '24 22:06 PurseChicken

I tested that 0.14.1 and later work with newer Gateway API version in GKE.

Maybe External-DNS documentation should now be changed to reflect that Gateway API isn't experimental phase anymore since the 1.0.0 GA release and subsequent breaking changes should be labeled as such.

jnauska avatar Jun 13 '24 06:06 jnauska

same problem here using 0.14.2 on gke

omriarieli avatar Jun 23 '24 09:06 omriarieli

@abursavich you clarified that this is an issue in GKE, correct? Can this issue be closed?

candita avatar Jun 28 '24 22:06 candita

@candita This is not just a GKE issue, this is an issue for everyone not using GA version of Gateway API. GKE just bundles the Gateway API CRDs within the GKE versions, that's why mostly all the reported issues are manifesting there. Change that caused this is the same as removing support for betav1 from Ingress in the middle of transition period.

But as said already in this thread, external-dns stated

As the Gateway API is still in an experimental phase, ExternalDNS makes no backwards compatibility guarantees regarding its support.

This issue can be wiped under the rug with this documentation comment, as users should be able to upgrade the Gateway API version by themselves. But maybe the documentation should now reflect that Gateway API is GA in the documentation

jnauska avatar Jun 29 '24 05:06 jnauska