gateway
gateway copied to clipboard
make experimental gateway api crds optional so that gateway-api upgrade don't break envoy gateway
Description:
What issue is being seen? Describe what should be happening instead of the bug, for example: Envoy should not crash, the expected value isn't returned, etc.
Recently I upgraded the gateway-api to v1.1.0 from the standard channel, but I found that envoy-gateway is in error due to missing some experimental gateway api crds, even though I have not directly used GRPCRoute gateway.networking.k8s.io/v1alpha2 at all.
Make these experimental crds optional could help users to upgrade newer gateway-api releases.
envoy-gateway log:
Error: failed to create provider Kubernetes: failted to create gatewayapi controller: no matches for kind "GRPCRoute" in version "gateway.networking.k
8s.io/v1alpha2"
Usage:
envoy-gateway server [flags]
Aliases:
server, serve
Flags:
-c, --config-path string The path to the configuration file.
-h, --help help for server
failed to create provider Kubernetes: failted to create gatewayapi controller: no matches for kind "GRPCRoute" in version "gateway.networking.k8s.io/v
1alpha2"
More context: https://github.com/kubernetes-sigs/gateway-api/issues/3075 and the way istio handle it.
Repro steps:
Include sample requests, environment, etc. All data and inputs required to reproduce the bug.
Note: If there are privacy concerns, sanitize the data prior to sharing.
Environment:
Include the environment like gateway version, envoy version and so on.
Logs:
Include the access logs and the Envoy logs.
we've implemented something similar for ServiceImport where we check if it exists before watching it. For this case we are not the CRD owners, so this decision was easy
https://github.com/envoyproxy/gateway/blob/3bc7cf1d00635f602bc4731a904675d982d34317/internal/provider/kubernetes/controller.go#L1099
Do we need to the same for
- Gateway API CRDs - that are not installed from the EG Helm Chart, but pre-applied and managed by the cluster admin (e.g. GKE) (helm not updating CRDs just highlights this case even more )
- EG CRDs (BackendTrafficPolicy etc) - that are not installed in the cluster, where the user wants to run EG in minimal mode (Gateway API featureset) and pre install the Gateway API CRDs from the standard channel themselves.
cc @envoyproxy/gateway-maintainers
+1 on the issue
I have a similar but different issue. I have installed the experimental apis for v1.1.0 and eg is puking with this:
failed to create provider Kubernetes: failted to create gatewayapi controller: no matches for kind "BackendTLSPolicy" in version "gateway.networking.k8s.io/v1alpha2"
The latest version is v1alpha3.
@travisghansen EG v1.0.2 is using gateway-api v1.0.0, in which BackendTLSPolicy is v1alpha2. If you want to test with gateway-api v1.1.0, you can install EG latest version(v0.0.0-latest), which is built with the latest main branch code and currently depends on gateway-api v1.1.0 and BackendTLSPolicy is v1alpha3.
My two cents:
we've implemented something similar for
ServiceImportwhere we check if it exists before watching it. For this case we are not the CRD owners, so this decision was easyhttps://github.com/envoyproxy/gateway/blob/3bc7cf1d00635f602bc4731a904675d982d34317/internal/provider/kubernetes/controller.go#L1099
ServiceImport as backendRef is an opt-in feature for EG, not all EG users need it, so it's safe to ignore it when the CRD doesn't exist.
Do we need to the same for
- Gateway API CRDs - that are not installed from the EG Helm Chart, but pre-applied and managed by the cluster admin (e.g. GKE) (helm not updating CRDs just highlights this case even more )
IMO, it would be preferable for EG to fail to start and report an error rather than starting and ignoring the CRDs. The absence of these CRDs usually indicate an unexpected issue: mismatch between Gateway API version and EG version, so it's better to fail early. A clearer error message would be useful, for example: “EG depends on Gateway API v x.x.x, please install this version of Gateway API before starting EG.”
- EG CRDs (BackendTrafficPolicy etc) - that are not installed in the cluster, where the user wants to run EG in minimal mode (Gateway API featureset) and pre install the Gateway API CRDs from the standard channel themselves.
In theory, yes, but I doubt anyone wants this, we can do this when people ask for it.
In the meantime, Envoy Gateway v1.1.0 was released; I am trying to use it with the Gateway API 1.1.0 CRDs and get the same error:
Error: failed to create provider Kubernetes: failted to create gatewayapi controller: no matches for kind "BackendTLSPolicy" in version "gateway.networking.k8s.io/v1alpha3"
I am using FluxCD to deploy it, the configuration is below:
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: envoy-gateway-release
namespace: flux-system
spec:
install:
createNamespace: true
interval: 1m
targetNamespace: envoy-gateway
storageNamespace: envoy-gateway
chart:
spec:
chart: gateway-helm
sourceRef:
kind: HelmRepository
name: envoy-gateway-repo
interval: 1m
version: 'v1.1.0'
The Gateway API CRDs are also deployed with Flux:
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: gateway-api-repo
namespace: flux-system
spec:
interval: 1m0s
ref:
tag: v1.1.0
url: https://github.com/kubernetes-sigs/gateway-api
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: gateway-api-crd
namespace: flux-system
spec:
interval: 1m
path: ./config/crd/standard
prune: true
sourceRef:
kind: GitRepository
name: gateway-api-repo
@inistor The BackendTLSPolicy version in Gtaeway API v1.1.0 is v1alpha3.
Can you verify it with this command?
kubectl get crd backendtlspolicies.gateway.networking.k8s.io -o jsonpath='{.spec.versions[*].name}'
Thanks for the quick response. Sure, here goes:
➜ ~ kubectl get crd backendtlspolicies.gateway.networking.k8s.io -o jsonpath='{.spec.versions[*].name}'
v1alpha2
v1alpha2 seems to be the one included with GW API CRD v1.1.0 or is my CRD deployment somehow lagging behind?
Looks like the CRD is a wrong version. You can update it to v1alpha3 in the Gateway API v1.1.0 and try again. https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.1.0/experimental-install.yaml
What you linked is the experimental channel; I am using the standard channel. Likely, this is the reason.
However, I would say that installing the release Envoy Proxy Gateway should not require the experimental channel CRDs - unless explicitly enabled (say, via a Helm value); moreover, no such option exist (neither to enable nor to disable the use of experimental CRDs).
I believe this is the actual point of this issue: the inability to use the Envoy Proxy Gateway with the "standard" channel Gateway API CRDs.
Oh, I understand now. Thanks for the explanation. I thought you needed Experimental channel CRDs as BackendTLSPolicy v1alpha2 existed in the cluster.
If that's the case, we need to decide whether EG should support opting out of the experimental channel CRDs. I can bring this up for discussion in the next EG meeting.
Thank you for considering this.
Just to point out that some other Gateway API implementers have taken a similar approach:
- Traefik: https://github.com/traefik/traefik-helm-chart/blob/403504c9a28f8c36b9c811651a6d080a1d5d0537/traefik/values.yaml#L129
- Nginx Gateway Fabric: https://github.com/nginxinc/nginx-gateway-fabric/blob/3ca9a69b12b101630fbcac6fbe0baea508a28c6d/charts/nginx-gateway-fabric/values.yaml#L76
adding this into the v1.2 milestone, lets do this in a dynamic way similar to what is done for ServiceImport https://github.com/envoyproxy/gateway/blob/3bc7cf1d00635f602bc4731a904675d982d34317/internal/provider/kubernetes/controller.go#L1099
I have a similar issues going from 1.0.1 to 1.1.0
"failed to create provider Kubernetes: failted to create gatewayapi controller: no matches for kind "BackendTLSPolicy" in version "[gateway.networking.k8s.io/v1alpha3](http://gateway.networking.k8s.io/v1alpha3)""
I have a similar issues going from 1.0.1 to 1.1.0
"failed to create provider Kubernetes: failted to create gatewayapi controller: no matches for kind "BackendTLSPolicy" in version "[gateway.networking.k8s.io/v1alpha3](http://gateway.networking.k8s.io/v1alpha3)""
@davem-git v1.0 to v1.1 upgrade steps are highlighted here https://gateway.envoyproxy.io/docs/install/install-yaml/#upgrading-from-v10
I will try that on my second instance. I deleted the whole thing to reinstall fresh, something I can do on dev but not once we get to prod