Unable to override cilium-agent config with CiliumNodeConfigs
/kind bug
1. What kops version are you running? The command kops version, will display
this information.
kopsVersion: 1.30.3
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
v1.30.7
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
From a 1.30 k8s cluster bootstrapped by kOps with cilium networking add-on:
- deploy CiliumNodeConfigs CRD, i.e. this yaml (see here for how this CR works)
- create a CiliumNodeConfig object (see below)
- restart all cilium pods
- run
kubectl exec <cilium pod> -n <ns> -- cilium config | grep -i policyaudit
The test object:
apiVersion: cilium.io/v2alpha1
kind: CiliumNodeConfig
metadata:
name: policy-audit-mode-override
namespace: kube-system
spec:
defaults:
policy-audit-mode: "true"
nodeSelector:
matchLabels: {}
5. What happened after the commands executed?
PolicyAuditMode is Disabled
6. What did you expect to happen?
PolicyAuditMode should be Enabled
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know?
After looking into kOps template that deploy cilium, we found that
- There is a config initContainer that read cilium-config, override values and write configs to /tmp/cilium/config-map under a /tmp volume mount
- In cilium-agent container, it’s configured to use config from /tmp/cilium/config-map as expected
- However, that container also mount the cilium-config ConfigMap on /tmp/cilium/config-map, that overwritten the node level configs generated by the initContainer, effectively rollback the override values
The solution is to remove the unneeded configmap volume mount in the cilium-agent container at https://github.com/kubernetes/kops/blob/release-1.31/upup/models/cloudup/resources/addons/networking.cilium.io/k8s-1.16-v1.15.yaml.template#L1198-L1200
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
We are in the process of testing the fix in kOps 1.32, will close once we confirm