Problems initializing SRLinux
Hi, I'm deploy the topology 2node-srl-ixr6-with-oc-services.pbtxt, but the containers are unable to stay 'ready' and 'running' as they keep restarting constantly.
Deploying topology:
kne create 2node-srl-ixr6-with-oc-services.pbtxt
I0209 10:00:54.369740 3846339 root.go:119] /home/mw/kne/examples/nokia/srlinux-services
I0209 10:00:54.371543 3846339 topo.go:117] Trying in-cluster configuration
I0209 10:00:54.371573 3846339 topo.go:120] Falling back to kubeconfig: "/home/mw/.kube/config"
I0209 10:00:54.374046 3846339 topo.go:253] Adding Link: srl1:e1-1 srl2:e1-1
I0209 10:00:54.374077 3846339 topo.go:291] Adding Node: srl1:NOKIA
I0209 10:00:54.424631 3846339 topo.go:291] Adding Node: srl2:NOKIA
I0209 10:00:54.459290 3846339 topo.go:358] Creating namespace for topology: "2-srl-ixr6"
I0209 10:00:54.484813 3846339 topo.go:368] Server Namespace: &Namespace{ObjectMeta:{2-srl-ixr6 4b34dc30-d2b2-4340-a901-8967fb08c69e 82945402 0 2024-02-09 10:00:54 +0000 UTC <nil> <nil> map[kubernetes.io/metadata.name:2-srl-ixr6] map[] [] [] [{kne Update v1 2024-02-09 10:00:54 +0000 UTC FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:kubernetes.io/metadata.name":{}}}} }]},Spec:NamespaceSpec{Finalizers:[kubernetes],},Status:NamespaceStatus{Phase:Active,Conditions:[]NamespaceCondition{},},}
I0209 10:00:54.485491 3846339 topo.go:395] Getting topology specs for namespace 2-srl-ixr6
I0209 10:00:54.485510 3846339 topo.go:324] Getting topology specs for node srl1
I0209 10:00:54.485574 3846339 topo.go:324] Getting topology specs for node srl2
I0209 10:00:54.485610 3846339 topo.go:402] Creating topology for meshnet node srl1
I0209 10:00:54.507333 3846339 topo.go:402] Creating topology for meshnet node srl2
I0209 10:00:54.522376 3846339 topo.go:375] Creating Node Pods
I0209 10:00:54.522726 3846339 nokia.go:201] Creating Srlinux node resource srl1
I0209 10:00:54.537059 3846339 nokia.go:206] Created SR Linux node srl1 configmap
I0209 10:00:54.631596 3846339 nokia.go:265] Created Srlinux resource: srl1
I0209 10:00:54.764968 3846339 topo.go:380] Node "srl1" resource created
I0209 10:00:54.765040 3846339 nokia.go:201] Creating Srlinux node resource srl2
I0209 10:00:54.780052 3846339 nokia.go:206] Created SR Linux node srl2 configmap
I0209 10:00:54.910542 3846339 nokia.go:265] Created Srlinux resource: srl2
I0209 10:00:55.028768 3846339 topo.go:380] Node "srl2" resource created
I0209 10:04:15.460792 3846339 topo.go:448] Node "srl1": Status RUNNING
Status of the pods:
k get pods -n 2-srl-ixr6
NAME READY STATUS RESTARTS AGE
srl1 0/1 Running 1 (9s ago) 13s
srl2 0/1 Running 1 (9s ago) 13s
k get pods -n 2-srl-ixr6
NAME READY STATUS RESTARTS AGE
srl1 0/1 Init:CrashLoopBackOff 1 (8s ago) 16s
srl2 0/1 Init:CrashLoopBackOff 1 (8s ago) 16s
k get pods -n 2-srl-ixr6
NAME READY STATUS RESTARTS AGE
srl1 0/1 Error 2 32s
srl2 0/1 Error 2 32s
Events for the container srl1:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m9s default-scheduler Successfully assigned 2-srl-ixr6/srl1 to k8worker4
Normal Killing 5m59s (x2 over 6m5s) kubelet Stopping container srl1
Warning BackOff 5m56s kubelet Back-off restarting failed container init-srl1 in pod srl1_2-srl-ixr6(600952b1-695d-44c3-95a0-a68ba2f9be5a)
Normal SandboxChanged 5m55s (x3 over 6m5s) kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 5m50s (x3 over 6m8s) kubelet Container image "ghcr.io/srl-labs/init-wait:latest" already present on machine
Normal Created 5m50s (x3 over 6m8s) kubelet Created container init-srl1
Normal Started 5m50s (x3 over 6m7s) kubelet Started container init-srl1
Warning BackOff 5m50s kubelet Back-off restarting failed container srl1 in pod srl1_2-srl-ixr6(600952b1-695d-44c3-95a0-a68ba2f9be5a)
Normal Pulled 5m49s (x3 over 6m6s) kubelet Container image "ghcr.io/nokia/srlinux" already present on machine
Normal Created 5m48s (x3 over 6m6s) kubelet Created container srl1
Normal Started 5m48s (x3 over 6m6s) kubelet Started container srl1
You need to have a license for srlinux
You need to have a license for srlinux
In this documentation https://learn.srlinux.dev/tutorials/infrastructure/kne/installation/#license it mentions that it is possible to use SRLinux without a license by removing certain fields, which I have tried but the error I mentioned above still occurs.
Without sharing the exact topology you try to start it is not possible to answer any questions
Without sharing the exact topology you try to start it is not possible to answer any questions
The topology I am testing is exactly the same as the one provided in the example repository, https://github.com/openconfig/kne/blob/main/examples/nokia/srlinux-services/2node-srl-ixr6-with-oc-services.pbtxt
it can't be the same, since you should have removed the ixr6e model from the topology and openconfig models from the config
it can't be the same, since you should have removed the ixr6e model from the topology and openconfig models from the config
I'm sorry for any confusion. I meant to say that the topology I'm using is based on the example from the repository. I've tested it in both configurations with and without, the 'ixr6e' model and OpenConfig models from the configuration. However, I have had the same result in both cases.
You need to investigate pod logs to understand the reason; most likely, you need more changes than a simple removal of the model & openconfig container. There are a few other things in the cfg that are not supported on other platforms.
Starting with the default config is your best bet, reusing the configs from ixr6/10 examples on other platforms is unlikely to give you good results.
Hi,
Currently I persist the error that I have commented on the restart of the pods, looking at the documentation https://learn.srlinux.dev/tutorials/infrastructure/kne/installation/#__tabbed_2_1 in the tutorial indicates that it was used as a test k8s cluster kind, the problem of restarting the pods occurs when I deploy the pods on an external cluster that was created with kubeadm and not kin
I have observed in the srlinus-controller logs when creating the pods the following errors.
1.7104141231090307e+09 INFO updating srlinux status {"controller": "srlinux", "controllerGroup": "kne.srlinux.dev", "controllerKind": "Srlinux", "Srlinux": {"name":"srl1","namespace":"2srl-prueba-2"}, "namespace": "2srl-prueba-2", "name": "srl1", "reconcileID": "f0b1efe1-1c56-44a9-a205-6dd38b58f561", "srlinux-status": {"status":"Pending","image":"ghcr.io/nokia/srlinux:latest","startup-config":{}}}
1.7104141231321757e+09 **ERROR** failed to update Srlinux status {"controller": "srlinux", "controllerGroup": "kne.srlinux.dev", "controllerKind": "Srlinux", "Srlinux": {"name":"srl1","namespace":"2srl-prueba-2"}, "namespace": "2srl-prueba-2", "name": "srl1", "reconcileID": "f0b1efe1-1c56-44a9-a205-6dd38b58f561", "error": "Operation cannot be fulfilled on srlinuxes.kne.srlinux.dev \"srl1\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/srl-labs/srl-controller/controllers.(*SrlinuxReconciler).updateSrlinuxStatus
/workspace/controllers/srlinux_controller.go:265
github.com/srl-labs/srl-controller/controllers.(*SrlinuxReconciler).Reconcile
/workspace/controllers/srlinux_controller.go:123
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234
1.7104141231323476e+09 **ERROR** Reconciler error {"controller": "srlinux", "controllerGroup": "kne.srlinux.dev", "controllerKind": "Srlinux", "Srlinux": {"name":"srl1","namespace":"2srl-prueba-2"}, "namespace": "2srl-prueba-2", "name": "srl1", "reconcileID": "f0b1efe1-1c56-44a9-a205-6dd38b58f561", "error": "Operation cannot be fulfilled on srlinuxes.kne.srlinux.dev \"srl1\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:326
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234
I have tested to deploy the srlinux on kind cluster and this problem does not happen.
Has anyone had this same problem when not using kind as a cluster and would know how to solve it?
this error on its own doesn't lead to any issues. The reconciliation should still happen. If you see your pods not coming up, then something else prevents it, not the reconciliation error. I saw this error in my clusters, but it is transient and goes away