piraeus-operator
piraeus-operator copied to clipboard
Error “can't read superblock on /dev/drbd1001.” when trying to start a pod
I see errors in the pod events:
$ kubectl describe pod -n monitoring kube-prometheus-stack-grafana-5fd67c647-7m5hs
Name: kube-prometheus-stack-grafana-5fd67c647-7m5hs
Namespace: monitoring
Priority: 0
Service Account: kube-prometheus-stack-grafana
Node: k8s-m0/10.1.27.210
Start Time: Mon, 03 Jun 2024 08:55:48 +0200
Labels: app.kubernetes.io/instance=kube-prometheus-stack
app.kubernetes.io/name=grafana
pod-template-hash=5fd67c647
Annotations: checksum/config: 4eb333c36d59bdac7f568835d4c71dc0e5b1ca4f0bbc06eb5379f9ca352b8914
checksum/sc-dashboard-provider-config: 593c0a8778b83f11fe80ccb21dfb20bc46705e2be3178df1dc4c89d164c8cd9c
checksum/secret: 032056e9c62bbe9d1daa41ee49cd3d9524c076f51ca4c65adadf4ef08ef28712
kubectl.kubernetes.io/default-container: grafana
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/kube-prometheus-stack-grafana-5fd67c647
Init Containers:
init-chown-data:
Container ID:
Image: docker.io/library/busybox:1.31.1
Image ID:
Port: <none>
Host Port: <none>
SeccompProfile: RuntimeDefault
Command:
chown
-R
472:472
/var/lib/grafana
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/lib/grafana from storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-p2xnp (ro)
Containers:
grafana-sc-dashboard:
Container ID:
Image: quay.io/kiwigrid/k8s-sidecar:1.26.1
Image ID:
Port: <none>
Host Port: <none>
SeccompProfile: RuntimeDefault
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment:
METHOD: WATCH
LABEL: grafana_dashboard
LABEL_VALUE: 1
FOLDER: /tmp/dashboards
RESOURCE: both
NAMESPACE: ALL
REQ_USERNAME: <set to the key 'admin-user' in secret 'kube-prometheus-stack-grafana'> Optional: false
REQ_PASSWORD: <set to the key 'admin-password' in secret 'kube-prometheus-stack-grafana'> Optional: false
REQ_URL: http://localhost:3000/api/admin/provisioning/dashboards/reload
REQ_METHOD: POST
Mounts:
/tmp/dashboards from sc-dashboard-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-p2xnp (ro)
grafana-sc-datasources:
Container ID:
Image: quay.io/kiwigrid/k8s-sidecar:1.26.1
Image ID:
Port: <none>
Host Port: <none>
SeccompProfile: RuntimeDefault
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment:
METHOD: WATCH
LABEL: grafana_datasource
LABEL_VALUE: 1
FOLDER: /etc/grafana/provisioning/datasources
RESOURCE: both
REQ_USERNAME: <set to the key 'admin-user' in secret 'kube-prometheus-stack-grafana'> Optional: false
REQ_PASSWORD: <set to the key 'admin-password' in secret 'kube-prometheus-stack-grafana'> Optional: false
REQ_URL: http://localhost:3000/api/admin/provisioning/datasources/reload
REQ_METHOD: POST
Mounts:
/etc/grafana/provisioning/datasources from sc-datasources-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-p2xnp (ro)
grafana:
Container ID:
Image: docker.io/grafana/grafana:10.4.1
Image ID:
Ports: 3000/TCP, 9094/TCP, 9094/UDP
Host Ports: 0/TCP, 0/TCP, 0/UDP
SeccompProfile: RuntimeDefault
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Liveness: http-get http://:3000/api/health delay=60s timeout=30s period=10s #success=1 #failure=10
Readiness: http-get http://:3000/api/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_IP: (v1:status.podIP)
GF_SECURITY_ADMIN_USER: <set to the key 'admin-user' in secret 'kube-prometheus-stack-grafana'> Optional: false
GF_SECURITY_ADMIN_PASSWORD: <set to the key 'admin-password' in secret 'kube-prometheus-stack-grafana'> Optional: false
GF_PATHS_DATA: /var/lib/grafana/
GF_PATHS_LOGS: /var/log/grafana
GF_PATHS_PLUGINS: /var/lib/grafana/plugins
GF_PATHS_PROVISIONING: /etc/grafana/provisioning
Mounts:
/etc/grafana/grafana.ini from config (rw,path="grafana.ini")
/etc/grafana/provisioning/dashboards/sc-dashboardproviders.yaml from sc-dashboard-provider (rw,path="provider.yaml")
/etc/grafana/provisioning/datasources from sc-datasources-volume (rw)
/tmp/dashboards from sc-dashboard-volume (rw)
/var/lib/grafana from storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-p2xnp (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-prometheus-stack-grafana
Optional: false
storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: kube-prometheus-stack-grafana
ReadOnly: false
sc-dashboard-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
sc-dashboard-provider:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-prometheus-stack-grafana-config-dashboards
Optional: false
sc-datasources-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-p2xnp:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 49m default-scheduler Successfully assigned monitoring/kube-prometheus-stack-grafana-5fd67c647-7m5hs to k8s-m0
Warning FailedAttachVolume 49m attachdetach-controller Multi-Attach error for volume "pvc-40a7bc3f-d655-4606-a671-863913f657c0" Volume is already used by pod(s) kube-prometheus-stack-grafana-5fd67c647-dcqrx
Warning FailedMount 49m kubelet MountVolume.SetUp failed for volume "pvc-40a7bc3f-d655-4606-a671-863913f657c0" : rpc error: code = Internal desc = NodePublishVolume failed for pvc-40a7bc3f-d655-4606-a671-863913f657c0: mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o _netdev /dev/drbd1001 /var/lib/kubelet/pods/1fc6504d-b1e0-43d1-8012-32a69210124a/volumes/kubernetes.io~csi/pvc-40a7bc3f-d655-4606-a671-863913f657c0/mount
Output: mount: /var/lib/kubelet/pods/1fc6504d-b1e0-43d1-8012-32a69210124a/volumes/kubernetes.io~csi/pvc-40a7bc3f-d655-4606-a671-863913f657c0/mount: can't read superblock on /dev/drbd1001.
dmesg(1) may have more information after failed mount system call.
Normal SuccessfulAttachVolume 4m5s (x46 over 49m) attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-40a7bc3f-d655-4606-a671-863913f657c0"
Warning FailedMount 6s (x31 over 49m) kubelet MountVolume.WaitForAttach failed for volume "pvc-40a7bc3f-d655-4606-a671-863913f657c0" : volume pvc-40a7bc3f-d655-4606-a671-863913f657c0 has GET error for volume attachment csi-1e1f5fcc627e2e7c3ca2595b3593e4d17c9188e63cec5e7d8f1cbe950418e564: volumeattachments.storage.k8s.io "csi-1e1f5fcc627e2e7c3ca2595b3593e4d17c9188e63cec5e7d8f1cbe950418e564" is forbidden: User "system:node:k8s-m0" cannot get resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope: no relationship found between node 'k8s-m0' and this object
The cluster consists of three master/worker nodes k8s-m0, k8s-m1 and k8s-m2
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-m0 Ready control-plane 28d v1.28.9 10.1.27.210 <none> Ubuntu 22.04.4 LTS 5.15.0-107-generic containerd://1.7.3
k8s-m1 Ready control-plane 28d v1.28.9 10.1.27.211 <none> Ubuntu 22.04.4 LTS 5.15.0-107-generic containerd://1.7.3
k8s-m2 Ready control-plane 28d v1.28.9 10.1.27.212 <none> Ubuntu 22.04.4 LTS 5.15.0-107-generic containerd://1.7.3
I see only one linstor error
$ kubectl exec -ti -n piraeus-datastore deployments/linstor-controller -- /bin/bash
root@linstor-controller-797bc7456f-8mgws:/# linstor error-reports list
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Id ┊ Datetime ┊ Node ┊ Exception ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ 664F7013-00000-000000 ┊ 2024-06-02 22:00:12 ┊ C|linstor-controller-797bc7456f-8mgws ┊ SocketException: Network is unreachable ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@linstor-controller-797bc7456f-8mgws:/# linstor error-reports show 664F7013-00000-000000
ERROR REPORT 664F7013-00000-000000
============================================================
Application: LINBIT? LINSTOR
Module: Controller
Version: 1.27.0
Build ID: 8250eddde5f533facba39b4d1f77f1ef85f8521d
Build time: 2024-04-02T07:12:21+00:00
Error time: 2024-06-02 22:00:12
Node: linstor-controller-797bc7456f-8mgws
Thread: SslConnector
Access context information
Identity: PUBLIC
Role: PUBLIC
Domain: PUBLIC
Peer: Node: 'k8s-m2'
============================================================
Reported error:
===============
Category: Exception
Class name: SocketException
Class canonical name: java.net.SocketException
Generated at: Method 'pollConnect', Source file 'Net.java, Unknown line number
Error message: Network is unreachable
Error context:
I/O exception while attempting to connect to the peer
Call backtrace:
Method Native Class:Line number
pollConnect Y sun.nio.ch.Net:unknown
pollConnectNow N sun.nio.ch.Net:672
finishConnect N sun.nio.ch.SocketChannelImpl:946
establishConnection N com.linbit.linstor.netcom.TcpConnectorService:993
run N com.linbit.linstor.netcom.TcpConnectorService:728
run N java.lang.Thread:840
END OF ERROR REPORT.
There is an issue with PVC used by the above pod when the pod is running on node k8s-m0
root@linstor-controller-797bc7456f-8mgws:/# linstor r l
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-40a7bc3f-d655-4606-a671-863913f657c0 ┊ k8s-m0 ┊ ┊ Unused ┊ StandAlone(k8s-m1) ┊ Diskless ┊ 2024-05-06 16:11:35 ┊
┊ pvc-40a7bc3f-d655-4606-a671-863913f657c0 ┊ k8s-m1 ┊ ┊ Unused ┊ Connecting(k8s-m0) ┊ UpToDate ┊ 2024-05-06 16:11:31 ┊
┊ pvc-40a7bc3f-d655-4606-a671-863913f657c0 ┊ k8s-m2 ┊ ┊ InUse ┊ Ok ┊ UpToDate ┊ 2024-05-06 16:11:35 ┊
┊ pvc-335fe40b-7250-4b2a-a0b3-c9eb1780e528 ┊ k8s-m0 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-29 15:11:23 ┊
┊ pvc-335fe40b-7250-4b2a-a0b3-c9eb1780e528 ┊ k8s-m1 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-29 15:11:27 ┊
┊ pvc-335fe40b-7250-4b2a-a0b3-c9eb1780e528 ┊ k8s-m2 ┊ ┊ Unused ┊ Ok ┊ TieBreaker ┊ 2024-05-29 15:11:26 ┊
┊ pvc-4297b5a5-4c61-4638-a63d-729f5021d46f ┊ k8s-m0 ┊ ┊ InUse ┊ Ok ┊ UpToDate ┊ 2024-05-06 16:11:28 ┊
┊ pvc-4297b5a5-4c61-4638-a63d-729f5021d46f ┊ k8s-m1 ┊ ┊ Unused ┊ Ok ┊ Diskless ┊ 2024-05-06 16:11:35 ┊
┊ pvc-4297b5a5-4c61-4638-a63d-729f5021d46f ┊ k8s-m2 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-06 16:11:34 ┊
┊ pvc-cae1b7e0-d80d-47a8-8161-53063a5ccf36 ┊ k8s-m0 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-23 16:37:38 ┊
┊ pvc-cae1b7e0-d80d-47a8-8161-53063a5ccf36 ┊ k8s-m1 ┊ ┊ Unused ┊ Ok ┊ TieBreaker ┊ 2024-05-23 16:37:43 ┊
┊ pvc-cae1b7e0-d80d-47a8-8161-53063a5ccf36 ┊ k8s-m2 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-23 16:37:44 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
When I run dmesg -T on the k8s-m0 node, I get the following output:
[Mo Jun 3 05:16:06 2024] drbd pvc-40a7bc3f-d655-4606-a671-863913f657c0: Preparing cluster-wide state change 352557914 (2->-1 3/1)
[Mo Jun 3 05:16:06 2024] drbd pvc-40a7bc3f-d655-4606-a671-863913f657c0: Aborting cluster-wide state change 352557914 (0ms) rv = -10
[Mo Jun 3 05:16:06 2024] drbd pvc-40a7bc3f-d655-4606-a671-863913f657c0: Declined by peer k8s-m2 (id: 1), see the kernel log there
[Mo Jun 3 05:16:06 2024] drbd pvc-40a7bc3f-d655-4606-a671-863913f657c0: Preparing cluster-wide state change 2847615758 (2->-1 3/1)
[Mo Jun 3 05:16:06 2024] drbd pvc-40a7bc3f-d655-4606-a671-863913f657c0: Aborting cluster-wide state change 2847615758 (0ms) rv = -10
[Mo Jun 3 05:16:06 2024] drbd pvc-40a7bc3f-d655-4606-a671-863913f657c0: Declined by peer k8s-m2 (id: 1), see the kernel log there
[Mo Jun 3 05:16:06 2024] drbd pvc-40a7bc3f-d655-4606-a671-863913f657c0: Auto-promote failed: State change was refused by peer node (-10)
[Mo Jun 3 05:16:06 2024] EXT4-fs (drbd1001): INFO: recovery required on readonly filesystem
[Mo Jun 3 05:16:06 2024] EXT4-fs (drbd1001): write access will be enabled during recovery
[Mo Jun 3 05:16:07 2024] drbd pvc-40a7bc3f-d655-4606-a671-863913f657c0/0 drbd1001: Rejected WRITE request, not in Primary role. open_cnt:1 [mount:600441:2024-06-03_03:15:46.918]
[Mo Jun 3 05:16:07 2024] Buffer I/O error on dev drbd1001, logical block 0, lost async page write
[Mo Jun 3 05:16:07 2024] Buffer I/O error on dev drbd1001, logical block 1, lost async page write
[Mo Jun 3 05:16:07 2024] Buffer I/O error on dev drbd1001, logical block 1028, lost async page write
[Mo Jun 3 05:16:07 2024] Buffer I/O error on dev drbd1001, logical block 1043, lost async page write
[Mo Jun 3 05:16:07 2024] Buffer I/O error on dev drbd1001, logical block 1059, lost async page write
[Mo Jun 3 05:16:07 2024] Buffer I/O error on dev drbd1001, logical block 9267, lost async page write
[Mo Jun 3 05:16:07 2024] Buffer I/O error on dev drbd1001, logical block 524320, lost async page write
[Mo Jun 3 05:16:07 2024] Buffer I/O error on dev drbd1001, logical block 1572880, lost async page write
[Mo Jun 3 05:16:07 2024] Buffer I/O error on dev drbd1001, logical block 1572896, lost async page write
[Mo Jun 3 05:16:07 2024] Buffer I/O error on dev drbd1001, logical block 1581106, lost async page write
[Mo Jun 3 05:16:07 2024] JBD2: recovery failed
[Mo Jun 3 05:16:07 2024] EXT4-fs (drbd1001): error loading journal
[Mo Jun 3 05:16:10 2024] drbd pvc-40a7bc3f-d655-4606-a671-863913f657c0 k8s-m2: Preparing remote state change 502811644
[Mo Jun 3 05:16:10 2024] drbd pvc-40a7bc3f-d655-4606-a671-863913f657c0 k8s-m2: Committing remote state change 502811644 (primary_nodes=0)
Create and attach SOS report:
kubectl exec -n piraeus-datastore -it deploy/linstor-controller -- linstor sos-report create
I see also errors in dmesg output on the k8s-m1 node
[Mo Jun 3 12:00:36 2024] IPVS: rr: TCP 10.96.29.122:3371 - no destination available
[Mo Jun 3 12:00:36 2024] IPVS: rr: TCP [fd12::7128]:3371 - no destination available
[Mo Jun 3 12:00:36 2024] IPVS: rr: TCP 10.96.29.122:3371 - no destination available
[Mo Jun 3 12:00:36 2024] IPVS: rr: TCP [fd12::7128]:3371 - no destination available
[Mo Jun 3 12:00:37 2024] IPVS: rr: TCP [fd12::7128]:3371 - no destination available
[Mo Jun 3 12:00:37 2024] IPVS: rr: TCP 10.96.29.122:3371 - no destination available
[Mo Jun 3 12:00:37 2024] IPVS: rr: TCP [fd12::7128]:3371 - no destination available
[Mo Jun 3 12:00:37 2024] IPVS: rr: TCP [fd12::7128]:3371 - no destination available
[Mo Jun 3 12:00:38 2024] IPVS: rr: TCP [fd12::7128]:3371 - no destination available
[Mo Jun 3 12:00:38 2024] IPVS: rr: TCP 10.96.29.122:3371 - no destination available
[Mo Jun 3 12:00:41 2024] net_ratelimit: 8 callbacks suppressed
[Mo Jun 3 12:00:41 2024] IPVS: rr: TCP 10.96.29.122:3371 - no destination available
[Mo Jun 3 12:00:41 2024] IPVS: rr: TCP [fd12::7128]:3371 - no destination available
[Mo Jun 3 12:00:42 2024] IPVS: rr: TCP [fd12::7128]:3371 - no destination available
[Mo Jun 3 12:00:42 2024] IPVS: rr: TCP [fd12::7128]:3371 - no destination available
[Mo Jun 3 12:00:42 2024] IPVS: rr: TCP 10.96.29.122:3371 - no destination available
[Mo Jun 3 12:00:42 2024] IPVS: rr: TCP [fd12::7128]:3371 - no destination available
[Mo Jun 3 12:00:43 2024] IPVS: rr: TCP [fd12::7128]:3371 - no destination available
[Mo Jun 3 12:00:43 2024] IPVS: rr: TCP 10.96.29.122:3371 - no destination available
[Mo Jun 3 12:00:43 2024] IPVS: rr: TCP [fd12::7128]:3371 - no destination available
[Mo Jun 3 12:00:44 2024] IPVS: rr: TCP 10.96.29.122:3371 - no destination available
[Mo Jun 3 12:00:47 2024] net_ratelimit: 7 callbacks suppressed
[Mo Jun 3 12:00:47 2024] IPVS: rr: TCP [fd12::7128]:3371 - no destination available
[Mo Jun 3 12:00:47 2024] IPVS: rr: TCP 10.96.29.122:3371 - no destination available
[Mo Jun 3 12:00:47 2024] IPVS: rr: TCP [fd12::7128]:3371 - no destination available
10.96.29.122 is the IP of the linstor-controller service:
$ kubectl get svc -A -o wide | grep -F 10.96.29.122
piraeus-datastore linstor-controller ClusterIP 10.96.29.122 <none> 3371/TCP,3370/TCP 28d app.kubernetes.io/component=linstor-controller,app.kubernetes.io/instance=linstorcluster,app.kubernetes.io/name=piraeus-datastore
Something is still using the resource on node m2, so it cannot start on m0. Check the output of mount on m2 to see where the volume is in use.
This was a Grafana pod as part of the Kubernetes monitoring deployment. Kubernetes tried to restart it several times due to issues with Linstor Storage until it was successfully started on the k8s-m2 node. Now I have scaled down the corresponding deployment and the problem is still there:
$ kubectl scale deployment -n monitoring kube-prometheus-stack-grafana --replicas 0
$ kubectl exec -ti -n piraeus-datastore deployments/linstor-controller -- /bin/bash
root@linstor-controller-797bc7456f-8mgws:/# linstor r l
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-40a7bc3f-d655-4606-a671-863913f657c0 ┊ k8s-m0 ┊ ┊ Unused ┊ StandAlone(k8s-m1) ┊ Diskless ┊ 2024-05-06 16:11:35 ┊
┊ pvc-40a7bc3f-d655-4606-a671-863913f657c0 ┊ k8s-m1 ┊ ┊ Unused ┊ Connecting(k8s-m0) ┊ UpToDate ┊ 2024-05-06 16:11:31 ┊
┊ pvc-40a7bc3f-d655-4606-a671-863913f657c0 ┊ k8s-m2 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-06 16:11:35 ┊
┊ pvc-335fe40b-7250-4b2a-a0b3-c9eb1780e528 ┊ k8s-m0 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-29 15:11:23 ┊
┊ pvc-335fe40b-7250-4b2a-a0b3-c9eb1780e528 ┊ k8s-m1 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-29 15:11:27 ┊
┊ pvc-335fe40b-7250-4b2a-a0b3-c9eb1780e528 ┊ k8s-m2 ┊ ┊ Unused ┊ Ok ┊ TieBreaker ┊ 2024-05-29 15:11:26 ┊
┊ pvc-2492b46b-6466-4e2d-8820-b5fa9299ad9c ┊ k8s-m0 ┊ ┊ Unused ┊ Ok ┊ TieBreaker ┊ 2024-06-04 13:08:55 ┊
┊ pvc-2492b46b-6466-4e2d-8820-b5fa9299ad9c ┊ k8s-m1 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-06-04 13:08:52 ┊
┊ pvc-2492b46b-6466-4e2d-8820-b5fa9299ad9c ┊ k8s-m2 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-06-04 13:08:56 ┊
┊ pvc-4297b5a5-4c61-4638-a63d-729f5021d46f ┊ k8s-m0 ┊ ┊ InUse ┊ Ok ┊ UpToDate ┊ 2024-05-06 16:11:28 ┊
┊ pvc-4297b5a5-4c61-4638-a63d-729f5021d46f ┊ k8s-m1 ┊ ┊ Unused ┊ Ok ┊ Diskless ┊ 2024-05-06 16:11:35 ┊
┊ pvc-4297b5a5-4c61-4638-a63d-729f5021d46f ┊ k8s-m2 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-06 16:11:34 ┊
┊ pvc-cae1b7e0-d80d-47a8-8161-53063a5ccf36 ┊ k8s-m0 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-23 16:37:38 ┊
┊ pvc-cae1b7e0-d80d-47a8-8161-53063a5ccf36 ┊ k8s-m1 ┊ ┊ Unused ┊ Ok ┊ TieBreaker ┊ 2024-05-23 16:37:43 ┊
┊ pvc-cae1b7e0-d80d-47a8-8161-53063a5ccf36 ┊ k8s-m2 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-23 16:37:44 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
You can try running kubctl exec ds/linstor-satellite.k8s-m0 -- drbdadm adjust pvc-40a7bc3f-d655-4606-a671-863913f657c0 to kick things back into working order.
Thanks, but when I went to execute the command, I realized that Linstor had already somehow repaired itself:
$ kubectl exec -ti -n piraeus-datastore deployments/linstor-controller -- /bin/bash
root@linstor-controller-797bc7456f-8mgws:/# linstor r l
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-40a7bc3f-d655-4606-a671-863913f657c0 ┊ k8s-m0 ┊ ┊ Unused ┊ Ok ┊ Diskless ┊ 2024-05-06 16:11:35 ┊
┊ pvc-40a7bc3f-d655-4606-a671-863913f657c0 ┊ k8s-m1 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-06 16:11:31 ┊
┊ pvc-40a7bc3f-d655-4606-a671-863913f657c0 ┊ k8s-m2 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-06 16:11:35 ┊
┊ pvc-335fe40b-7250-4b2a-a0b3-c9eb1780e528 ┊ k8s-m0 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-29 15:11:23 ┊
┊ pvc-335fe40b-7250-4b2a-a0b3-c9eb1780e528 ┊ k8s-m1 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-29 15:11:27 ┊
┊ pvc-335fe40b-7250-4b2a-a0b3-c9eb1780e528 ┊ k8s-m2 ┊ ┊ Unused ┊ Ok ┊ TieBreaker ┊ 2024-05-29 15:11:26 ┊
┊ pvc-2492b46b-6466-4e2d-8820-b5fa9299ad9c ┊ k8s-m0 ┊ ┊ Unused ┊ Ok ┊ TieBreaker ┊ 2024-06-04 13:08:55 ┊
┊ pvc-2492b46b-6466-4e2d-8820-b5fa9299ad9c ┊ k8s-m1 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-06-04 13:08:52 ┊
┊ pvc-2492b46b-6466-4e2d-8820-b5fa9299ad9c ┊ k8s-m2 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-06-04 13:08:56 ┊
┊ pvc-4297b5a5-4c61-4638-a63d-729f5021d46f ┊ k8s-m0 ┊ ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-05-06 16:11:28 ┊
┊ pvc-4297b5a5-4c61-4638-a63d-729f5021d46f ┊ k8s-m1 ┊ ┊ Unused ┊ Ok ┊ Diskless ┊ 2024-05-06 16:11:35 ┊
┊ pvc-4297b5a5-4c61-4638-a63d-729f5021d46f ┊ k8s-m2 ┊ ┊ InUse ┊ Ok ┊ UpToDate ┊ 2024-05-06 16:11:34 ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
I also tested to run grafana deployment with nodeSelector on all nodes without problems. Can you give me any clues as to what actually happened and what your command should have done?