cilium-cli icon indicating copy to clipboard operation
cilium-cli copied to clipboard

Enabling `enable-local-redirect-policy` indefinitely waits on CRD registration

Open jlaffaye opened this issue 3 years ago • 4 comments

Bug report

General Information

  • Cilium CLI version (run cilium version): cilium-cli: 0.11.10 compiled with go1.18.3 on darwin/arm64
  • Orchestration system version in use (e.g. kubectl version, ...): v1.22.7
  • Platform / infrastructure information (e.g. AWS / Azure / GCP, image / kernel versions): GKE
  • Link to relevant artifacts (policies, deployments scripts, ...)
  • Generate and upload a system zip: cilium sysdump

How to reproduce the issue

  1. cilium config set enable-local-redirect-policy true
  2. all agents are restarted but fail to start "waiting for all CRDs" indefinitely

restarting cilium-operator fixes the issue by creating the ciliumlocalredirectpolicies.cilium.io CRD.

Not sure if its a CLI issue that should restart cilium-operator or an operator issue that should pickup the configmap change without a restart.

jlaffaye avatar Jun 23 '22 16:06 jlaffaye

I saw the same issue on 1.11 and 1.13.1. I can provide more info if needed.

Jiang1155 avatar May 05 '23 00:05 Jiang1155

@aditighag MBOI

brb avatar May 05 '23 05:05 brb

Hi @jlaffaye Sorry, looks like the issue fell through the cracks.

restarting cilium-operator fixes the issue by creating the ciliumlocalredirectpolicies.cilium.io CRD. Not sure if its a CLI issue that should restart cilium-operator or an operator issue that should pickup the configmap change without a restart.

That's a fair point! cilium config set internally restarts cilium agent pods by default, so maybe it also makes sense to restart cilium-operator. Cilium operator is tasked with registering all CRDs, and so you may see this issue for other features as well. I'll bring this up in the community meeting to get more insights. You are welcome to join the discussion.

aditighag avatar May 07 '23 13:05 aditighag

In case anyone runs into this from the Internet, restarting Cilium operator got the ball rolling.

We got stuck configuring CiliumLocalRedirectPolicy. In our environment with Cilium 1.14.4, we had to specify .spec.redirectFrontend.serviceMatcher.toPorts otherwise the LRP list does not show the endpoints and traffic is not routed to node-local dns cache.

Here are the steps to verify:

  1. Create this CiliumLocalRedirectPolicy without .spec.redirectFrontend.serviceMatcher.toPorts. This is as described in the documentation and the provided example.
apiVersion: cilium.io/v2
kind: CiliumLocalRedirectPolicy
metadata:
  name: node-local-dns
  namespace: kube-system
spec:
  redirectBackend:
    localEndpointSelector:
      matchLabels:
        k8s-app: node-local-dns
    toPorts:
    - name: dns
      port: "53"
      protocol: UDP
    - name: dns-tcp
      port: "53"
      protocol: TCP
  redirectFrontend:
    serviceMatcher:
      namespace: kube-system
      serviceName: rke2-coredns-rke2-coredns
  1. View your lrp list. You will not see your pods on the right side of the arrow and traffic is not directed to node-local dns cache.
❯ kubectl -n kube-system exec ds/cilium -- cilium lrp list
LRP namespace   LRP name         FrontendType                Matching Service
kube-system     node-local-dns   clusterIP + all svc ports   kube-system/rke2-coredns-rke2-coredns
                |                10.43.0.10:53/UDP ->
                |                10.43.0.10:53/TCP ->
  1. Delete your existing lrp. At the time of writing, there is a limitation that prevents modifications to an existing lrp.
kubectl delete ciliumlocalredirectpolicies -n kube-system node-local-dns
  1. Create a new CiliumLocalRedirectPolicy with .spec.redirectFrontend.serviceMatcher.toPorts.
apiVersion: cilium.io/v2
kind: CiliumLocalRedirectPolicy
metadata:
  name: node-local-dns
  namespace: kube-system
spec:
  redirectBackend:
    localEndpointSelector:
      matchLabels:
        k8s-app: node-local-dns
    toPorts:
    - name: dns
      port: "53"
      protocol: UDP
    - name: dns-tcp
      port: "53"
      protocol: TCP
  redirectFrontend:
    serviceMatcher:
      namespace: kube-system
      serviceName: rke2-coredns-rke2-coredns
      toPorts:
        - name: dns-tcp
          port: "53"
          protocol: TCP
        - name: dns
          port: "53"
          protocol: UDP
  1. Verify your lrp list has backend pods and traffic routed to your node-local dns cache. The output will show your node-local-dns pod IPs.
❯ kubectl -n kube-system exec ds/cilium -- cilium lrp list
LRP namespace   LRP name         FrontendType              Matching Service
kube-system     node-local-dns   clusterIP + named ports   kube-system/rke2-coredns-rke2-coredns
                |                10.43.0.10:53/TCP -> 10.42.3.193:53(kube-system/node-local-dns-sbtdr),
                |                10.43.0.10:53/UDP -> 10.42.3.193:53(kube-system/node-local-dns-sbtdr),

atsai1220 avatar Apr 05 '24 08:04 atsai1220