kops icon indicating copy to clipboard operation
kops copied to clipboard

Unable to override cilium-agent config with CiliumNodeConfigs

Open admun opened this issue 10 months ago • 2 comments

/kind bug

1. What kops version are you running? The command kops version, will display this information.

kopsVersion: 1.30.3

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

v1.30.7

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

From a 1.30 k8s cluster bootstrapped by kOps with cilium networking add-on:

  1. deploy CiliumNodeConfigs CRD, i.e. this yaml (see here for how this CR works)
  2. create a CiliumNodeConfig object (see below)
  3. restart all cilium pods
  4. run kubectl exec <cilium pod> -n <ns> -- cilium config | grep -i policyaudit

The test object:

apiVersion: cilium.io/v2alpha1
kind: CiliumNodeConfig
metadata:
  name: policy-audit-mode-override
  namespace: kube-system
spec:
  defaults:
    policy-audit-mode: "true"
  nodeSelector:
    matchLabels: {}

5. What happened after the commands executed?

PolicyAuditMode is Disabled

6. What did you expect to happen?

PolicyAuditMode should be Enabled

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

After looking into kOps template that deploy cilium, we found that

  1. There is a config initContainer that read cilium-config, override values and write configs to /tmp/cilium/config-map under a /tmp volume mount
  2. In cilium-agent container, it’s configured to use config from /tmp/cilium/config-map as expected
  3. However, that container also mount the cilium-config ConfigMap on /tmp/cilium/config-map, that overwritten the node level configs generated by the initContainer, effectively rollback the override values

The solution is to remove the unneeded configmap volume mount in the cilium-agent container at https://github.com/kubernetes/kops/blob/release-1.31/upup/models/cloudup/resources/addons/networking.cilium.io/k8s-1.16-v1.15.yaml.template#L1198-L1200

admun avatar Feb 19 '25 23:02 admun

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 20 '25 23:05 k8s-triage-robot

/remove-lifecycle stale

We are in the process of testing the fix in kOps 1.32, will close once we confirm

admun avatar May 28 '25 16:05 admun