gateway icon indicating copy to clipboard operation
gateway copied to clipboard

make experimental gateway api crds optional so that gateway-api upgrade don't break envoy gateway

Open networkhermit opened this issue 1 year ago • 17 comments
trafficstars

Description:

What issue is being seen? Describe what should be happening instead of the bug, for example: Envoy should not crash, the expected value isn't returned, etc.

Recently I upgraded the gateway-api to v1.1.0 from the standard channel, but I found that envoy-gateway is in error due to missing some experimental gateway api crds, even though I have not directly used GRPCRoute gateway.networking.k8s.io/v1alpha2 at all.

Make these experimental crds optional could help users to upgrade newer gateway-api releases.

envoy-gateway log:

Error: failed to create provider Kubernetes: failted to create gatewayapi controller: no matches for kind "GRPCRoute" in version "gateway.networking.k
8s.io/v1alpha2"
Usage:
  envoy-gateway server [flags]

Aliases:
  server, serve

Flags:
  -c, --config-path string   The path to the configuration file.
  -h, --help                 help for server

failed to create provider Kubernetes: failted to create gatewayapi controller: no matches for kind "GRPCRoute" in version "gateway.networking.k8s.io/v
1alpha2"

More context: https://github.com/kubernetes-sigs/gateway-api/issues/3075 and the way istio handle it.

Repro steps:

Include sample requests, environment, etc. All data and inputs required to reproduce the bug.

Note: If there are privacy concerns, sanitize the data prior to sharing.

Environment:

Include the environment like gateway version, envoy version and so on.

Logs:

Include the access logs and the Envoy logs.

networkhermit avatar May 13 '24 15:05 networkhermit

we've implemented something similar for ServiceImport where we check if it exists before watching it. For this case we are not the CRD owners, so this decision was easy https://github.com/envoyproxy/gateway/blob/3bc7cf1d00635f602bc4731a904675d982d34317/internal/provider/kubernetes/controller.go#L1099

Do we need to the same for

  1. Gateway API CRDs - that are not installed from the EG Helm Chart, but pre-applied and managed by the cluster admin (e.g. GKE) (helm not updating CRDs just highlights this case even more )
  2. EG CRDs (BackendTrafficPolicy etc) - that are not installed in the cluster, where the user wants to run EG in minimal mode (Gateway API featureset) and pre install the Gateway API CRDs from the standard channel themselves.

cc @envoyproxy/gateway-maintainers

arkodg avatar May 13 '24 19:05 arkodg

+1 on the issue

nezdolik avatar Jun 10 '24 12:06 nezdolik

I have a similar but different issue. I have installed the experimental apis for v1.1.0 and eg is puking with this:

failed to create provider Kubernetes: failted to create gatewayapi controller: no matches for kind "BackendTLSPolicy" in version "gateway.networking.k8s.io/v1alpha2"

The latest version is v1alpha3.

travisghansen avatar Jul 06 '24 00:07 travisghansen

@travisghansen EG v1.0.2 is using gateway-api v1.0.0, in which BackendTLSPolicy is v1alpha2. If you want to test with gateway-api v1.1.0, you can install EG latest version(v0.0.0-latest), which is built with the latest main branch code and currently depends on gateway-api v1.1.0 and BackendTLSPolicy is v1alpha3.

zhaohuabing avatar Jul 06 '24 01:07 zhaohuabing

My two cents:

we've implemented something similar for ServiceImport where we check if it exists before watching it. For this case we are not the CRD owners, so this decision was easy

https://github.com/envoyproxy/gateway/blob/3bc7cf1d00635f602bc4731a904675d982d34317/internal/provider/kubernetes/controller.go#L1099

ServiceImport as backendRef is an opt-in feature for EG, not all EG users need it, so it's safe to ignore it when the CRD doesn't exist.

Do we need to the same for

  1. Gateway API CRDs - that are not installed from the EG Helm Chart, but pre-applied and managed by the cluster admin (e.g. GKE) (helm not updating CRDs just highlights this case even more )

IMO, it would be preferable for EG to fail to start and report an error rather than starting and ignoring the CRDs. The absence of these CRDs usually indicate an unexpected issue: mismatch between Gateway API version and EG version, so it's better to fail early. A clearer error message would be useful, for example: “EG depends on Gateway API v x.x.x, please install this version of Gateway API before starting EG.”

  1. EG CRDs (BackendTrafficPolicy etc) - that are not installed in the cluster, where the user wants to run EG in minimal mode (Gateway API featureset) and pre install the Gateway API CRDs from the standard channel themselves.

In theory, yes, but I doubt anyone wants this, we can do this when people ask for it.

zhaohuabing avatar Jul 06 '24 02:07 zhaohuabing

In the meantime, Envoy Gateway v1.1.0 was released; I am trying to use it with the Gateway API 1.1.0 CRDs and get the same error:

Error: failed to create provider Kubernetes: failted to create gatewayapi controller: no matches for kind "BackendTLSPolicy" in version "gateway.networking.k8s.io/v1alpha3"

I am using FluxCD to deploy it, the configuration is below:

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: envoy-gateway-release
  namespace: flux-system
spec:
  install:
    createNamespace: true
  interval: 1m
  targetNamespace: envoy-gateway
  storageNamespace: envoy-gateway
  chart:
    spec:
      chart: gateway-helm
      sourceRef:
        kind: HelmRepository
        name: envoy-gateway-repo
      interval: 1m
      version: 'v1.1.0'

The Gateway API CRDs are also deployed with Flux:

apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: gateway-api-repo
  namespace: flux-system
spec:
  interval: 1m0s
  ref:
    tag: v1.1.0
  url: https://github.com/kubernetes-sigs/gateway-api
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: gateway-api-crd
  namespace: flux-system
spec:
  interval: 1m
  path: ./config/crd/standard
  prune: true
  sourceRef:
    kind: GitRepository
    name: gateway-api-repo

inistor avatar Jul 24 '24 16:07 inistor

@inistor The BackendTLSPolicy version in Gtaeway API v1.1.0 is v1alpha3.

Can you verify it with this command?

kubectl get crd backendtlspolicies.gateway.networking.k8s.io -o jsonpath='{.spec.versions[*].name}'

zhaohuabing avatar Jul 25 '24 05:07 zhaohuabing

Thanks for the quick response. Sure, here goes:

➜  ~ kubectl get crd backendtlspolicies.gateway.networking.k8s.io -o jsonpath='{.spec.versions[*].name}'
v1alpha2

v1alpha2 seems to be the one included with GW API CRD v1.1.0 or is my CRD deployment somehow lagging behind?

inistor avatar Jul 25 '24 05:07 inistor

Looks like the CRD is a wrong version. You can update it to v1alpha3 in the Gateway API v1.1.0 and try again. https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.1.0/experimental-install.yaml

zhaohuabing avatar Jul 25 '24 13:07 zhaohuabing

What you linked is the experimental channel; I am using the standard channel. Likely, this is the reason.

However, I would say that installing the release Envoy Proxy Gateway should not require the experimental channel CRDs - unless explicitly enabled (say, via a Helm value); moreover, no such option exist (neither to enable nor to disable the use of experimental CRDs).

I believe this is the actual point of this issue: the inability to use the Envoy Proxy Gateway with the "standard" channel Gateway API CRDs.

inistor avatar Jul 25 '24 13:07 inistor

Oh, I understand now. Thanks for the explanation. I thought you needed Experimental channel CRDs as BackendTLSPolicy v1alpha2 existed in the cluster.

If that's the case, we need to decide whether EG should support opting out of the experimental channel CRDs. I can bring this up for discussion in the next EG meeting.

zhaohuabing avatar Jul 25 '24 13:07 zhaohuabing

Thank you for considering this.

inistor avatar Jul 25 '24 13:07 inistor

Just to point out that some other Gateway API implementers have taken a similar approach:

  • Traefik: https://github.com/traefik/traefik-helm-chart/blob/403504c9a28f8c36b9c811651a6d080a1d5d0537/traefik/values.yaml#L129
  • Nginx Gateway Fabric: https://github.com/nginxinc/nginx-gateway-fabric/blob/3ca9a69b12b101630fbcac6fbe0baea508a28c6d/charts/nginx-gateway-fabric/values.yaml#L76

inistor avatar Jul 31 '24 07:07 inistor

adding this into the v1.2 milestone, lets do this in a dynamic way similar to what is done for ServiceImport https://github.com/envoyproxy/gateway/blob/3bc7cf1d00635f602bc4731a904675d982d34317/internal/provider/kubernetes/controller.go#L1099

arkodg avatar Jul 31 '24 19:07 arkodg

I have a similar issues going from 1.0.1 to 1.1.0

 "failed to create provider Kubernetes: failted to create gatewayapi controller: no matches for kind "BackendTLSPolicy" in version "[gateway.networking.k8s.io/v1alpha3](http://gateway.networking.k8s.io/v1alpha3)""

davem-git avatar Aug 12 '24 21:08 davem-git

I have a similar issues going from 1.0.1 to 1.1.0

 "failed to create provider Kubernetes: failted to create gatewayapi controller: no matches for kind "BackendTLSPolicy" in version "[gateway.networking.k8s.io/v1alpha3](http://gateway.networking.k8s.io/v1alpha3)""

@davem-git v1.0 to v1.1 upgrade steps are highlighted here https://gateway.envoyproxy.io/docs/install/install-yaml/#upgrading-from-v10

arkodg avatar Aug 12 '24 21:08 arkodg

I will try that on my second instance. I deleted the whole thing to reinstall fresh, something I can do on dev but not once we get to prod

davem-git avatar Aug 12 '24 21:08 davem-git