gateway icon indicating copy to clipboard operation
gateway copied to clipboard

CRDs upgrade support

Open shahar-h opened this issue 1 year ago • 10 comments

Description: Currently Gateway API and EG CRDs are located in EG gateway-helm chart crds folder. This means that CRDs are not upgraded on chart upgrade, and need to be manually upgraded beforehand.

Possible solutions:

  1. Move crds inside templates. Real world example: cert-manager chart has CRDs as part of templates folder and exposes crds.enabled flag. It also allows to decide if we want to keep the CRDs on chart uninstallation with crds.keep flag, by leveraging "helm.sh/resource-policy": keep helm annotation. See: https://cert-manager.io/v1.13-docs/installation/helm/#crd-considerations. However, this is a breaking change for consumers that use EG chart as a sub chart and have custom resources as part of the main chart.
  2. Split the current chart into crds chart and application chart, and instruct to install the crds chart first. Real world example: Istio provides a base chart that installs CRDs: https://istio.io/latest/docs/setup/install/helm/#installation-steps.

Both 1 and 2 can also solve https://github.com/envoyproxy/gateway/issues/3094 by providing a separate flags(1)/charts(2) for Gateway API and EG CRDs.

Also Related:

  • https://github.com/envoyproxy/gateway/issues/2031
  • https://github.com/envoyproxy/gateway/issues/1721

WDYT?

shahar-h avatar Aug 05 '24 06:08 shahar-h

While trying to split the chart into 2 charts(app chart and CRDs chart) internally CRDs chart installation failed because helm release secret exceeded max 1MB size limit. In order to resolve that issue I split the CRDs chart into 2 more charts - eg CRDS and gateway api CRDs.

shahar-h avatar Sep 04 '24 03:09 shahar-h

thinking out loud, we could add two additional templates (w/o deleting the /crds dir) for the CRDs - one for gateway-api and one for envoy-gateway and use knobs such as applyCrds.all applyCrds.gateway-api and applyCrds.envoy-gateway

This solves 2 use cases

  • For cases where the cloud provider like GKE installs the gateway-api resources, a user could install EG with helm install --skip-crds --set applyCrds.envoy-gateway=true to selectively install EG CRDs
  • For upgrade cases , users could run helm upgrade --set applyCrds.all=true to update all CRDs or with --set applyCrds.envoy-gateway=true to only update EG CRDs

the naming for the helm knobs can be improved, of course

arkodg avatar Sep 11 '24 23:09 arkodg

@arkodg installing both EG and gateway-api CRDs from the same chart won't work since helm release secret will exceed 1MB limit as I mentioned here. In addition, installing CRDs from EG chart templates would not work for umbrella charts that contain some CRD instance.

What about keeping the crds folder and create 2 additional CRDs charts?

shahar-h avatar Sep 17 '24 12:09 shahar-h

ah thanks for trying it out @shahar-h .

Looks like we have a decision to make, neither being ideal

  1. Move CRDs from /crds to templates/ with an opt in / opt out approach similar to what cert-manager does (opt in) Pros
  • Users - Makes sure versions of CRD and controller are same, so there's implicit order and support during version upgrades
  • Maintainers - One less chart to maintain
  1. Add another gateway-crds-helm chart similar to what linked does https://linkerd.io/2-edge/tasks/install-helm/#linkerd-crds Pros
  • Explicit split up of responsibilities b/w cluster admin (installing CRDs) and platform admin (installing and running EG) Cons
  • Users need to ensure they are installing compatible/same versions

Common issues

  • Users may forget to opt in or opt out of installing CRDs during install and upgrade, which is probably also true for a gateway-crds-helm chart where users may forget to install it.

arkodg avatar Sep 17 '24 21:09 arkodg

argo also puts crds in templates

arkodg avatar Sep 17 '24 21:09 arkodg

@shahar-h can you help investigate who is the consuming most of the space ?

arkodg avatar Sep 20 '24 00:09 arkodg

@shahar-h can you help investigate who is the consuming most of the space ?

Sure, will do.

shahar-h avatar Sep 20 '24 05:09 shahar-h

thanks ! I did a little digging, one way to reduce size could be to rm the API descriptions, but that would break kubectl explain

arkodg avatar Sep 20 '24 17:09 arkodg

I computed eg CRDS & gwapi CRDs size by creating separate charts and installing each chart. Then I computed helm secret release size for each one of them with the following command:

kubectl -n envoy-gateway-system get secret <helm-release-secret-name> -o jsonpath='{.data.release}' | base64 -d | wc -c 

Results:

  • gwapi crds chart release secret size - 494132 Bytes (0.49MB)
  • eg crds chart release secret size - 685360 Bytes (0.68MB)

Total: 1.17MB > 1MB size limit

Note that the helm release secret size is lower that real CRDs size since helm uses gzip compression.

BTW there is an open PR to split helm releases into multiple k8s secrets.

shahar-h avatar Sep 23 '24 06:09 shahar-h

thanks for analyzing this ! the linked PR and the approach of using multiple k8s secrets looks promising, we could wait it out until that feature goes in and then move this the CRDs into templates

arkodg avatar Sep 23 '24 17:09 arkodg

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar Nov 15 '24 00:11 github-actions[bot]

I'm not sure if I should open a new issue or add something here, but let's start by doing it here.

A challenge I have with Envoy Gateway is also CRDs related. I'm using cert-manager with Gateway API support enabled to issue certificates for Envoy (https://gateway.envoyproxy.io/docs/tasks/security/tls-cert-manager/). Works beautifully, however today we ran into the issue that cert-manager with this Gateway API support enabled depends on the Kubernetes Gateway API CRDs. I can't recall the details (since it's been a few weeks ago that I looked at it for a different reason), but essentially I think it was trying to subscribe to any changes on the gateway.networking.k8s.io/v1:Gateway resources. I was installing cert-manager before envoy-gateway because that seemed the obvious order, but that didn't work because of this dependency.

A few weeks ago the obvious choice was to reverse that order, but today this ordering popped up again and we really want to reverse that order.

Okay, reason why I have this comment here is that it seems strange to me that the CRDs of the Kubernetes Gateway API are installed directly in this chart. It seems to me that the Kubernetes Gateway API CRDs are an 'interface' of which Envoy Gateway is an implementation, right? In that sense I would rather have this split as it would (probably?) allow me to install the Kubernetes Gateway API CRDs separately, then install cert-manager and then install the 'implementation': Envoy Gateway.

I hope that makes sense.

It would maybe also be a possibilty to have the Kubernets Gateway API as a conditional subchart of envoy-gateway-helm like kube-prometheus-stack is doing.

This could also be it's own issue, let me know.

aukevanleeuwen avatar Feb 07 '25 13:02 aukevanleeuwen

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar Mar 09 '25 20:03 github-actions[bot]

Just to add a 👍 for @aukevanleeuwen's comment. Gateway API CRDs should be treated as an external dependency. While it's nice as a convenience to have them installed alongside Envoy Gateway, users should be able to install Gateway API CRDs separately.

This would be useful in, for example, GitOps, where it's nice to be able to enforce the general deployment order: CRDs -> controllers -> configs.

bozho avatar Mar 25 '25 10:03 bozho

Got the same issue, but In my case I'm trying to migrate from another Gateway API vendor and since I already have Gateway API installed in my cluster I'm not able to install Envoy Gateway in the same environment.

I agree with @aukevanleeuwen and I think we should just remove Gateway API from the helm chart all together.

I will create a PR tomorrow to remove this dependency (yes, it will be a breaking change) and let's see what the maintainers say. Sadly, this won't solve the update issue, that this actually is about.

In the grafana-operator we publish our own CRDs as part of our releases in github as a separate file + in the helm chart. So if you want, you can update the EG CRDs separately based on the same version number as the helm chart, which is a simple solution.

nissessenap avatar Mar 25 '25 19:03 nissessenap

we cannot delete the gateway-api CRDs from the Helm Chart, that negatively impacts the install experience for most cases, but we can account for the cases you've mentioned

  • we can start by introducing a new chart gateway-crds-helm that installs the Envoy Gateway CRDs by default
  • we can also additionally generate a k8s install release artifact - envoy-gateway-crds.yaml similar to the one in the PR
  • once the Helm issue is resolved, we can also add Gateway API CRDs to gateway-crds-helm but disable it by default

wdyt @shahar-h @envoyproxy/gateway-maintainers @envoyproxy/gateway-reviewers ?

arkodg avatar Mar 25 '25 21:03 arkodg

+1 on adding a new chart named gateway-crds-helm

zirain avatar Mar 26 '25 01:03 zirain

+1

shahar-h avatar Mar 26 '25 04:03 shahar-h

I don't understand what issues would get solved by creating a new helm chart called gateway-crds-helm

And I don't understand what you mean with

  • once the Helm issue is resolved, we can also add Gateway API CRDs to gateway-crds-helm but disable it by default

The funny thing about helm is that if you use subcharts you can't set skip-crds. So the only way it would help is to move the CRDs in to the template folder any way. So then we can do it in the current chart just the same. Adding another helm chart will just create complexity.

I created a PR to move them from the CRDs folder in to template folder instead. This way we do it the same way as cert-manager. If this is the way forward you like, I can make that PR also include the Gateway CRDs as a separate artifact so you also have the option to not install the CRDs trough helm at all. That PR also needs fixes when it comes to docs and other stuff.

nissessenap avatar Mar 26 '25 08:03 nissessenap

@NissesSenap moving CRDs from crds folder to templates folder would be a breaking change for consumers that use EG chart as a sub chart and have EG/GW-API custom resources as part of the main chart, because they rely on EG crds to be installed before EG/GW-API custom resources.

shahar-h avatar Mar 26 '25 09:03 shahar-h

This is true. But if you got a helm chart that only contains the CRDs in a subchart in the template folder, it will also be a breaking change. And to my knowledge you can't set the order of when to install subcharts.

So I don't think you can't guarantee that the CRDs will be installed before the chart anyway, but I might be wrong here.

Or do you want to have 3 helm charts? One for the Gateway API CRDs and one for EG CRDs? Since adding the Both the Gateway API CRD and the EG CRD is a big part of the problem, at least for me.

To be honest, I don't care how this gets solved, I just want to solve it. Please explain exactly how you want it to be done, and I can create a PR for it assuming it solves both the use-case of upgrading the CRDs and not forcing me to install the GW-API CRDs.

nissessenap avatar Mar 26 '25 10:03 nissessenap

wdyt @shahar-h @envoyproxy/gateway-maintainers @envoyproxy/gateway-reviewers ?

+1

guydc avatar Mar 26 '25 17:03 guydc

Now that we have consensus, @NissesSenap assigning this to you since you mentioned you're interested in picking this up Outlined the steps in https://github.com/envoyproxy/gateway/issues/4001#issuecomment-2752626243, the first one being creating a new 'gateway-crds-helm'' chart ( the current one gateway-halm chart is unaffected)

arkodg avatar Mar 26 '25 18:03 arkodg

@arkodg what I don't understand in your comment is

  • once the Helm issue is resolved, we can also add Gateway API CRDs to gateway-crds-helm but disable it by default

After I create a separate helm chart for the eg CRDs then what?

What helm issue is it that you are waiting to get resolved? If you are talking about the upstream helm issues that cause all this pain around crd when it comes to helm, I don't think every will get solved. Is it some other helm issue?

But I can create a PR that creates a new helm chart with EG CRDs in the template folder to make it possible to disable. Or do you want them in the CRD folder?

nissessenap avatar Mar 26 '25 18:03 nissessenap

yes, the new gateway-crds-helm chart https://github.com/envoyproxy/gateway/tree/main/charts will only contain the EG CRDs and will live under templates/

the helm issue i'm referring to is https://github.com/envoyproxy/gateway/issues/4001#issuecomment-2367361128 as shahar mentioned, it may not be applicable if the data is compressed, so its worth trying to also add the Gateway API crds in the new gateway-crds-helm chart, but that can be done as a follow up

arkodg avatar Mar 26 '25 18:03 arkodg

@arkodg I put all the fixes in the same PR, hopefully you all think it loos okay. We could add some more documentation about using the new helm chart. But let's move the conversation over to the PR.

nissessenap avatar Mar 27 '25 14:03 nissessenap

keeping this open to track CI work to push charts to docker hub + docs work to highlight using these charts in the upgrade steps

arkodg avatar Apr 20 '25 03:04 arkodg

another decision point + extra TODOs here, is to also add the stable channel Gateway API CRDs in the chart, and decide which one should it default to

arkodg avatar Apr 24 '25 02:04 arkodg

keeping this open to track docs work in the installation / upgrade section

arkodg avatar May 03 '25 01:05 arkodg

chart is available in https://hub.docker.com/r/envoyproxy/gateway-crds-helm, closing this issue for now, and creating a new one to track docs work

arkodg avatar May 07 '25 01:05 arkodg