cluster-logging-operator
cluster-logging-operator copied to clipboard
"cluster-logging-operator" pod keeps restarting with "fatal error: concurrent map read and map write"
Describe the bug
Hello. We're facing an issue where the "cluster-logging-operator" pod has been restarted 100 times in the past 6 months, always with the same error "fatal error: concurrent map read and map write". Our openshift-logging is configured with a ClusterLogging
and a ClusterLogForwarder
forwarding logs to three Kafka brokers.
Environment
- Versions of OpenShift, Cluster Logging and any other relevant components
Client Version: 4.7.3
Server Version: 4.10.53
Kubernetes Version: v1.23.12+8a6bfe4
oc get deployment.apps/cluster-logging-operator -o yaml | grep version
operatorframework.io/properties: '{"properties":[{"type":"olm.gvk","value":{"group":"logging.openshift.io","kind":"ClusterLogForwarder","version":"v1"}},{"type":"olm.gvk","value":{"group":"logging.openshift.io","kind":"ClusterLogging","version":"v1"}},{"type":"olm.maxOpenShiftVersion","value":4.12},{"type":"olm.package","value":{"packageName":"cluster-logging","version":"5.5.9"}}]}'
- ClusterLogging instance
apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
annotations:
clusterlogging.openshift.io/logforwardingtechpreview: enabled
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"logging.openshift.io/v1","kind":"ClusterLogging","metadata":{"annotations":{"clusterlogging.openshift.io/logforwardingtechpreview":"enabled"},"name":"instance","namespace":"openshift-logging"},"spec":{"collection":{"logs":{"fluentd":{},"type":"fluentd"}},"managementState":"Unmanaged"}}
creationTimestamp: "2021-07-27T14:40:14Z"
generation: 5
managedFields:
- apiVersion: logging.openshift.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:clusterlogging.openshift.io/logforwardingtechpreview: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:spec:
.: {}
f:collection:
.: {}
f:logs:
.: {}
f:fluentd: {}
f:type: {}
manager: kubectl-client-side-apply
operation: Update
time: "2021-07-27T14:40:53Z"
- apiVersion: logging.openshift.io/v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:collection:
f:logs:
f:fluentd:
f:resources: {}
f:status:
.: {}
f:clusterConditions: {}
f:collection:
.: {}
f:logs:
.: {}
f:fluentdStatus:
.: {}
f:daemonSet: {}
f:nodes:
.: {}
f:fluentd-2hcrj: {}
f:fluentd-2kbxm: {}
f:fluentd-4dg7r: {}
f:fluentd-4v7qs: {}
f:fluentd-5jmhk: {}
f:fluentd-84kkk: {}
f:fluentd-8dp6m: {}
f:fluentd-8wncg: {}
f:fluentd-8wv7k: {}
f:fluentd-8xrwk: {}
f:fluentd-47jbr: {}
f:fluentd-cp8gm: {}
f:fluentd-f57pt: {}
f:fluentd-gl8bb: {}
f:fluentd-gsgm9: {}
f:fluentd-hmkm9: {}
f:fluentd-jjjpv: {}
f:fluentd-lbn4k: {}
f:fluentd-lkxvh: {}
f:fluentd-mvq7m: {}
f:fluentd-n7q9b: {}
f:fluentd-p7n7x: {}
f:fluentd-pbjh9: {}
f:fluentd-rnzn6: {}
f:fluentd-rrntm: {}
f:fluentd-s925v: {}
f:fluentd-t5hsx: {}
f:fluentd-xg7gq: {}
f:fluentd-xkhmj: {}
f:fluentd-xmpht: {}
f:pods:
.: {}
f:failed: {}
f:notReady: {}
f:ready: {}
f:curation: {}
f:logStore: {}
f:visualization: {}
manager: cluster-logging-operator
operation: Update
time: "2021-07-27T14:47:12Z"
- apiVersion: logging.openshift.io/v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:managementState: {}
manager: Mozilla
operation: Update
time: "2021-07-27T14:47:37Z"
name: instance
namespace: openshift-logging
resourceVersion: "12835895"
uid: d29a1c1d-2c74-4e49-928e-62ba89487d84
spec:
collection:
logs:
fluentd: {}
type: fluentd
managementState: Unmanaged
status:
collection:
logs:
fluentdStatus:
daemonSet: fluentd
nodes:
fluentd-2hcrj: ocp-master-1.internal-url.org
fluentd-2kbxm: ocp-master-5.internal-url.org
fluentd-4v7qs: ocp-worker-4.internal-url.org
fluentd-5jmhk: ocp-worker-2.internal-url.org
fluentd-47jbr: ocp-worker-12.internal-url.org
fluentd-4dg7r: ocp-worker-21.internal-url.org
fluentd-84kkk: ocp-worker-11.internal-url.org
fluentd-8dp6m: ocp-worker-1.internal-url.org
fluentd-8wncg: ocp-worker-17.internal-url.org
fluentd-8wv7k: ocp-worker-16.internal-url.org
fluentd-8xrwk: ocp-worker-8.internal-url.org
fluentd-cp8gm: ocp-worker-10.internal-url.org
fluentd-f57pt: ocp-worker-18.internal-url.org
fluentd-gl8bb: ocp-worker-23.internal-url.org
fluentd-gsgm9: ocp-master-4.internal-url.org
fluentd-hmkm9: ocp-worker-15.internal-url.org
fluentd-jjjpv: ocp-worker-22.internal-url.org
fluentd-lbn4k: ocp-master-3.internal-url.org
fluentd-lkxvh: ocp-worker-19.internal-url.org
fluentd-mvq7m: ocp-worker-5.internal-url.org
fluentd-n7q9b: ocp-worker-3.internal-url.org
fluentd-p7n7x: ocp-worker-25.internal-url.org
fluentd-pbjh9: ocp-worker-13.internal-url.org
fluentd-rnzn6: ocp-master-2.internal-url.org
fluentd-rrntm: ocp-worker-6.internal-url.org
fluentd-s925v: ocp-worker-24.internal-url.org
fluentd-t5hsx: ocp-worker-7.internal-url.org
fluentd-xg7gq: ocp-worker-14.internal-url.org
fluentd-xkhmj: ocp-worker-20.internal-url.org
fluentd-xmpht: ocp-worker-9.internal-url.org
pods:
failed: []
notReady: []
ready:
- fluentd-2hcrj
- fluentd-2kbxm
- fluentd-47jbr
- fluentd-4dg7r
- fluentd-4v7qs
- fluentd-5jmhk
- fluentd-84kkk
- fluentd-8dp6m
- fluentd-8wncg
- fluentd-8wv7k
- fluentd-8xrwk
- fluentd-cp8gm
- fluentd-f57pt
- fluentd-gl8bb
- fluentd-gsgm9
- fluentd-hmkm9
- fluentd-jjjpv
- fluentd-lbn4k
- fluentd-lkxvh
- fluentd-mvq7m
- fluentd-n7q9b
- fluentd-p7n7x
- fluentd-pbjh9
- fluentd-rnzn6
- fluentd-rrntm
- fluentd-s925v
- fluentd-t5hsx
- fluentd-xg7gq
- fluentd-xkhmj
- fluentd-xmpht
curation: {}
logStore: {}
visualization: {}
Logs cluster-logging-operator.log
Expected behavior
cluster-logging-operator
pod does not crash/restart
Actual behavior Pod restarts after X amount of time
To Reproduce Cannot consistently reproduce - pod crashes seemingly at random after X time
Additional context Happy to provide additional info if necessary. Thank you.