kata-containers icon indicating copy to clipboard operation
kata-containers copied to clipboard

kata-deploy: Add Helm Chart

Open zvonkok opened this issue 1 year ago • 15 comments

For easier handling of kata-deploy we can leverage a Helm chart to get rid of all the base and overlays for the various components

zvonkok avatar Jun 19 '24 09:06 zvonkok

Per default the appVersion would be set to VERSION one can override it by simply saying:

helm install --set image.tag=latest  kata-deploy-0.1.0.tgz

A default

helm install ./kata-deploy-0.1.0.tgz 

would give one the latest release.

zvonkok avatar Jun 19 '24 09:06 zvonkok

The helm-chart can be eaisly hosted via github.io on kata-containers.

zvonkok avatar Jun 19 '24 09:06 zvonkok

With this we can also automate the chart publishing for each release https://github.com/helm/chart-releaser

zvonkok avatar Jun 19 '24 11:06 zvonkok

@beraldoleal Take a look at the second commit how we can use Helm for rendering the correct yamls without changing the yamls in place and hence making the repository dirty.

FYI @fidencio @ryansavino

zvonkok avatar Jun 20 '24 17:06 zvonkok

kata-deploy.yaml and kata-cleanup.yaml are the same manifests and scripts just with different arguments next commit will clean this up

zvonkok avatar Jun 20 '24 17:06 zvonkok

cleanup_kata_deploy is now really simple, just uninstall the helm release if found. All the needed data is encapsulated in the deployed release.

zvonkok avatar Jun 20 '24 18:06 zvonkok

With this we should be able to remove "ALL" base/overlay kustomzie manifests.

zvonkok avatar Jun 20 '24 18:06 zvonkok

TODO follow up PRs:

  • [ ] Add Helm chart publishing to release.yaml
  • [ ] Add Helm chart for each PR build
  • [ ] Latest PR build will also upload a latest Helm chart.
  • [ ] We need a "place" where to publish our Helm charts. I would be nice to have one Helm chart per PR

zvonkok avatar Jun 20 '24 18:06 zvonkok

I've build a simple GHA job to upload a Helm chart for each PR automatically to a specific repository. In my case its zvonkok/helm-charts those are hosted via github.io.

Replace here zvonkok/helm-charts with kata-containers/helm-charts moving forward just using zvonkok/helm-charts for illustration purposes.

helm repo add kata-containers https://zvonkok.github.io/helm-charts
helm repo update
helm search repo --devel -o json 
[{"name":"kata-containers/kata-deploy","version":"3.6.0-dev+24-4d45fd9818726d7d3a37cfd4ad1281bea29b67c2","app_version":"3.6.0-dev+24-4d45fd9818726d7d3a37cfd4ad1281bea29b67c2","description":"A Helm chart for deploying Kata Containers"}]

The input.tag (pr-githash) is defined by the GHA and I am using the VERSION + the tag to create a new sermer version for Helm to consume.

zvonkok avatar Jun 21 '24 12:06 zvonkok

Additionally values.yaml would be updated for the chart to point to the correct payload

imagePullPolicy: Always
imagePullSecrets: []
image:
  reference: ghcr.io/zvonkok/kata-deploy-ci/kata-deploy
  tag: 24-b9d7f9333087e5ee789af8f37dd26df3b3e308e0
# k8s-dist can be k8s, k3s, rke2, k0s
k8sDistribution: "k8s"
env:
  debug: "false"
  shims: "clh cloud-hypervisor dragonball fc qemu qemu-coco-dev qemu-runtime-rs qemu-sev qemu-snp qemu-tdx stratovirt qemu-nvidia-gpu qemu-nvidia-gpu-snp qemu-nvidia-gpu-tdx"
  defaultShim: "qemu"
  createRuntimeClasses: "false"
  createDefaultRuntimeClass: "false"
  allowedHypervisorAnnotations: ""
  snapshotterHandlerMapping: ""
  agentHttpProxy: ""
  agentNoProxy: ""
  pullTypeMapping: ""
  hostOS: ""

zvonkok avatar Jun 21 '24 12:06 zvonkok

For each PR we woudl have a Helm chart kata-deploy-VERSION-dev+{{ input.tag }},tar.gz and for each release we would have a kata-deploy-VERSION.tar.gz with updated values.yaml also pushed as an artifact in the release payload.

Users can then do a helm repo update and without the --devel flag they will not see any kata-deploy-VERSION-dev+{{ input.tag }},tar.gz charts, only the release charts.

zvonkok avatar Jun 21 '24 12:06 zvonkok

I was looking through the failed test runs for the amd node jobs. Let me know if you want to spend some time troubleshooting the failures together. Looks like it wasn't able to pull the kata-deploy image.

ryansavino avatar Jun 24 '24 20:06 ryansavino

@ryansavino Found the error. Thanks for the offer :)

zvonkok avatar Jun 26 '24 12:06 zvonkok

@mkulke Good point, adding this to the list of follow up items: https://github.com/kata-containers/kata-containers/issues/9924

zvonkok avatar Jun 27 '24 07:06 zvonkok

Please rebase this PR onto main when you want to re-trigger the whole set of checks (by pushing something and etc.) as #9923 resolves the issue for the zvsi tests. Thanks.

BbolroC avatar Jun 27 '24 08:06 BbolroC

@zvonkok, it's taking me some to get to this, but I'd like to ensure this works with TDX first. I will be force-pushing to your branch.

fidencio avatar Aug 01 '24 08:08 fidencio

I forced-pushed here to rebase.

fidencio avatar Aug 01 '24 08:08 fidencio

Found the issue, agent.https_proxy is not being properly set!

fidencio avatar Aug 01 '24 18:08 fidencio

git diff
diff --git a/tools/packaging/kata-deploy/helm-chart/kata-deploy/templates/kata-deploy.yaml b/tools/packaging/kata-deploy/helm-chart/kata-deploy/templates/kata-deploy.yaml
index 714e172e44..0d3565da38 100644
--- a/tools/packaging/kata-deploy/helm-chart/kata-deploy/templates/kata-deploy.yaml
+++ b/tools/packaging/kata-deploy/helm-chart/kata-deploy/templates/kata-deploy.yaml
@@ -47,7 +47,7 @@ spec:
         - name: SNAPSHOTTER_HANDLER_MAPPING
           value: {{ .Values.env.snapshotterHandlerMapping | quote }}
         - name: AGENT_HTTPS_PROXY
-          value: {{ .Values.env.agentHttpProxy | quote }}
+          value: {{ .Values.env.agentHttpsProxy | quote }}
         - name: AGENT_NO_PROXY
           value: {{ .Values.env.agentNoProxy | quote }}
         - name: PULL_TYPE_MAPPING
diff --git a/tools/packaging/kata-deploy/helm-chart/kata-deploy/values.yaml b/tools/packaging/kata-deploy/helm-chart/kata-deploy/values.yaml
index 004137a147..b1f195d1f1 100644
--- a/tools/packaging/kata-deploy/helm-chart/kata-deploy/values.yaml
+++ b/tools/packaging/kata-deploy/helm-chart/kata-deploy/values.yaml
@@ -13,7 +13,7 @@ env:
   createDefaultRuntimeClass: "false"
   allowedHypervisorAnnotations: ""
   snapshotterHandlerMapping: ""
-  agentHttpProxy: ""
+  agentHttpsProxy: ""
   agentNoProxy: ""
   pullTypeMapping: ""
   hostOS: ""

This will solve the issue.

fidencio avatar Aug 01 '24 18:08 fidencio

Force pushed, probably fixing the issue.

fidencio avatar Aug 01 '24 18:08 fidencio

For snp, the k8s-policy-hard-coded.bats test keeps failing. Looks like the pods aren't starting. I think this is probably unrelated, but I'd like to troubleshoot and diagnose a bit further. The nightly CI seems to be passing fine.

Do you think a rebase may help here?

ryansavino avatar Aug 05 '24 18:08 ryansavino

For snp, the k8s-policy-hard-coded.bats test keeps failing. Looks like the pods aren't starting. I think this is probably unrelated, but I'd like to troubleshoot and diagnose a bit further. The nightly CI seems to be passing fine.

What happened here was that the PR was rebased before the commit adding the test was added, and at that time the SNP CI was broken, with failures happening even before starting the tests. You guys fixed the issue there, but meanwhile the test you mentioned was merged, and the auto-rebase would pick that up as part of the tests to run, and then we ended up with that test failing.

I've rebased now, and this should give us a green CI everywhere.

fidencio avatar Aug 06 '24 07:08 fidencio

For snp, the k8s-policy-hard-coded.bats test keeps failing. Looks like the pods aren't starting. I think this is probably unrelated, but I'd like to troubleshoot and diagnose a bit further. The nightly CI seems to be passing fine.

What happened here was that the PR was rebased before the commit adding the test was added, and at that time the SNP CI was broken, with failures happening even before starting the tests. You guys fixed the issue there, but meanwhile the test you mentioned was merged, and the auto-rebase would pick that up as part of the tests to run, and then we ended up with that test failing.

I've rebased now, and this should give us a green CI everywhere.

Great. Thanks for explaining that. I was a bit confused. Approving.

ryansavino avatar Aug 06 '24 20:08 ryansavino