calico
calico copied to clipboard
calico-kube-controller crash with concurrent map read and map write
Expected Behavior
Not crash
Current Behavior
calico-kube-controller exits every few hours with fatal error: concurrent map read and map write. running the controller with debug reduces the race conduction to once a ~day
Possible Solution
Add mutex lock to the set function PR 8706
Steps to Reproduce (for bugs)
Running Calico on production environments with high pod/nodes state change. (We fail to reproduce this behavior on non production cluster)
Context
2024-03-21 16:16:16.233 [DEBUG][1] cache.go 133: converter.WorkloadEndpointData{PodName:"daemonset-daemonset-extended-1711037756-1711037762-vgqms", Namespace:"kuberhealthy", Labels:map[string]string{"app":"daemonset-daemonset-extended-1711037756-1711037762", "checkRunTime":"1711037762", "controller-revision-hash":"7f95f86fd5", "creatingInstance":"daemonset-extended-1711037756", "khcheck":"daemonset", "pod-template-generation":"1", "projectcalico.org/namespace":"kuberhealthy", "projectcalico.org/orchestrator":"k8s", "projectcalico.org/serviceaccount":"default", "source":"kuberhealthy"}, ServiceAccount:"default"} already exists in cache - comparing. type=converter.WorkloadEndpointData
2024-03-21 16:16:16.233 [DEBUG][1] workload_endpoint_default.go 56: Using prefix to create a WorkloadEndpoint veth name prefix="cali"
fatal error: concurrent map read and map write
goroutine 230 [running]:
reflect.mapaccess_faststr(0x1999240?, 0x18?, {0xc0165e0f20?, 0x1999240?})
/usr/local/go/src/runtime/map.go:1343 +0x1e
reflect.Value.MapIndex({0x1a5bca0?, 0xc006cb6620?, 0x4b078a?}, {0x1999240, 0xc00c0abc90, 0x98})
/usr/local/go/src/reflect/value.go:1664 +0xc5
reflect.deepValueEqual({0x1a5bca0?, 0xc010371ee0?, 0xc0017a1820?}, {0x1a5bca0?, 0xc006cb6620?, 0xc0017a1860?}, 0x80000000000?)
/usr/local/go/src/reflect/deepequal.go:147 +0x149e
reflect.deepValueEqual({0x1bb2440?, 0xc010371ec0?, 0xc000240660?}, {0x1bb2440?, 0xc006cb6600?, 0xc00069a330?}, 0x1cac260?)
/usr/local/go/src/reflect/deepequal.go:130 +0x1296
reflect.DeepEqual({0x1bb2440?, 0xc010371ec0?}, {0x1bb2440?, 0xc006cb6600?})
/usr/local/go/src/reflect/deepequal.go:237 +0x2c5
github.com/projectcalico/calico/kube-controllers/pkg/cache.(*calicoCache).Set(0xc0007c0140, {0xc0159d60a0, 0x45}, {0x1bb2440, 0xc006cb6600})
/go/src/github.com/projectcalico/calico/kube-controllers/pkg/cache/cache.go:134 +0x205
github.com/projectcalico/calico/kube-controllers/pkg/controllers/pod.NewPodController.func3({0x406818?, 0xc00072e0c0?}, {0x1cf28a0?, 0xc012d05400?})
/go/src/github.com/projectcalico/calico/kube-controllers/pkg/controllers/pod/pod_controller.go:170 +0x55e
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:239
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:814 +0xf7
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00014bf38?, {0x200eae0, 0xc0004b9f50}, 0x1, 0xc00013af00)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:92
k8s.io/client-go/tools/cache.(*processorListener).run(0xc000f8a000?)
/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:810 +0x6b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:75 +0x5a
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:73 +0x85
Your Environment
- Calico version 3.26.5
- Orchestrator version (e.g. kubernetes, mesos, rkt): k8s 1.25.16
- Operating System and version: Rocky 8
- Datastore: ETCD
- Link to your project (optional):