linstor-server
linstor-server copied to clipboard
Problem while creating large (100+ GB) volume
Just trying to create volume 100+ GB
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: largepvc
namespace: default
spec:
storageClassName: "linstor-store-r2"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
strategy:
type: Recreate
replicas: 1
selector:
matchLabels:
component: nginx
template:
metadata:
labels:
component: nginx
spec:
containers:
- name: nginx
image: nginx
command: ["/usr/sbin/nginx"]
args:
- -g
- daemon off;
volumeMounts:
- mountPath: "/app/media"
name: largepvc
ports:
- containerPort: 80
protocol: TCP
volumes:
- name: largepvc
persistentVolumeClaim:
claimName: largepvc
Linstor creates volume, but it looks like mkfs failed because of timeout
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m linstor Successfully assigned default/nginx-c49998c79-lvnlx to node0
Normal SuccessfulAttachVolume 18m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-c5dda542-a0ac-4336-882a-1724f98664b0"
Warning FailedMount 101s (x16 over 18m) kubelet MountVolume.SetUp failed for volume "pvc-c5dda542-a0ac-4336-882a-1724f98664b0" : rpc error: code = Internal desc = NodePublishVolume failed for pvc-c5dda542-a0ac-4336-882a-1724f98664b0: mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o _netdev /dev/drbd1020 /var/lib/kubelet/pods/59315b6d-3f90-4bad-b831-4af54963d3cb/volumes/kubernetes.io~csi/pvc-c5dda542-a0ac-4336-882a-1724f98664b0/mount
Output: mount: /var/lib/kubelet/pods/59315b6d-3f90-4bad-b831-4af54963d3cb/volumes/kubernetes.io~csi/pvc-c5dda542-a0ac-4336-882a-1724f98664b0/mount: wrong fs type, bad option, bad superblock on /dev/drbd1020, missing codepage or helper program, or other error.
Warning FailedMount 23s (x8 over 16m) kubelet Unable to attach or mount volumes: unmounted volumes=[largepvc], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition
Also, resource is stucked in InUse state in one of diskfull replicas
root@master1:~# linstor r l -r pvc-c5dda542-a0ac-4336-882a-1724f98664b0
+---------------------------------------------------------------------------------------------------------------+
| ResourceName | Node | Port | Usage | Conns | State | CreatedOn |
|===============================================================================================================|
| pvc-c5dda542-a0ac-4336-882a-1724f98664b0 | node0 | 7017 | InUse | Ok | UpToDate | 2023-10-07 18:03:57 |
| pvc-c5dda542-a0ac-4336-882a-1724f98664b0 | node4 | 7017 | Unused | Ok | UpToDate | 2023-10-07 18:04:57 |
| pvc-c5dda542-a0ac-4336-882a-1724f98664b0 | system1 | 7017 | Unused | Ok | TieBreaker | 2023-10-07 18:04:52 |
+---------------------------------------------------------------------------------------------------------------+
This case can be fixed by creating fs and drbdadm up/down on node with resource in InUse status
root@master1:~# kubectl -n d8-linstor exec -ti linstor-node-8rmmv bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "linstor-satellite" out of: linstor-satellite, kube-rbac-proxy, drbd-prometheus-exporter
root@node0:/# linstor v l -r pvc-c5dda542-a0ac-4336-882a-1724f98664b0
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊ Allocated ┊ InUse ┊ State ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ node0 ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ store ┊ 0 ┊ 1020 ┊ /dev/drbd1020 ┊ 1.28 GiB ┊ InUse ┊ UpToDate ┊
┊ node4 ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ store ┊ 0 ┊ 1020 ┊ /dev/drbd1020 ┊ 286.78 MiB ┊ Unused ┊ UpToDate ┊
┊ system1 ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ DfltDisklessStorPool ┊ 0 ┊ 1020 ┊ /dev/drbd1020 ┊ ┊ Unused ┊ TieBreaker ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@node0:/# mkfs.ext4 -E lazy_itable_init=1 -E lazy_journal_init=1 /dev/drbd1020
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 52428800 4k blocks and 13132800 inodes
Filesystem UUID: 8e8784d7-2b96-4d50-92ac-1a9ad8074637
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872
Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
root@node0:
root@node0:/# drbdadm down pvc-c5dda542-a0ac-4336-882a-1724f98664b0
root@node0:/# drbdadm up pvc-c5dda542-a0ac-4336-882a-1724f98664b0
root@node0:/# linstor v l -r pvc-c5dda542-a0ac-4336-882a-1724f98664b0
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊ Allocated ┊ InUse ┊ State ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ node0 ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ store ┊ 0 ┊ 1020 ┊ /dev/drbd1020 ┊ 286.78 MiB ┊ Unused ┊ UpToDate ┊
┊ node4 ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ store ┊ 0 ┊ 1020 ┊ /dev/drbd1020 ┊ 286.78 MiB ┊ Unused ┊ UpToDate ┊
┊ system1 ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ DfltDisklessStorPool ┊ 0 ┊ 1020 ┊ /dev/drbd1020 ┊ ┊ Unused ┊ TieBreaker ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@node0:/#
exit
root@master1:~# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-c49998c79-lvnlx 0/1 ContainerCreating 0 28m
root@master1:~# kubectl delete pod nginx-c49998c79-lvnlx
pod "nginx-c49998c79-lvnlx" deleted
root@master1:~# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-c49998c79-rpp82 1/1 Running 0 4m37s
root@master1:~# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
largepvc Bound pvc-c5dda542-a0ac-4336-882a-1724f98664b0 200Gi RWO linstor-store-r2 33m
I tried to use lazy params in storageclass, but it wasn't usefull
root@master1:~# kubectl get sc linstor-store-r2 -oyaml | grep fsOpts
linstor.csi.linbit.com/fsOpts: -E lazy_itable_init=1 -E lazy_journal_init=1
Looks like there some timeout while volume provisioning