linstor-server
linstor-server copied to clipboard
Wrong "Allocated capacity" for FileThin backend
Hi, Linstor team,
In linstor v1.4.2, newly created volume from a filethin pool shows 100% allocation. It needs to be fixed, otherwise adding a resource to this volume will cause a full sync.
thanks
# linstor v l
+---------------------------------------------------------------------------------------------------------------------------------------------+
| Node | Resource | StoragePool | VolumeNr | MinorNr | DeviceName | Allocated | InUse | State |
|=============================================================================================================================================|
| k8s-worker-1 | pvc-64fc2095-298b-4c5f-a20f-b4b803650336 | DfltStorPool | 0 | 1000 | /dev/drbd1000 | 11.00 GiB | Unused | UpToDate |
+---------------------------------------------------------------------------------------------------------------------------------------------+
# linstor vd l
+--------------------------------------------------------------------------------------------+
| ResourceName | VolumeNr | VolumeMinor | Size | Gross | State |
|============================================================================================|
| pvc-64fc2095-298b-4c5f-a20f-b4b803650336 | 0 | 1000 | 11 GiB | | ok |
+--------------------------------------------------------------------------------------------+
# linstor sp l -n k8s-worker-1 -s DfltStorPool
+---------------------------------------------------------------------------------------------------------------+
| StoragePool | Node | Driver | PoolName | FreeCapacity | TotalCapacity | SupportsSnapshots | State |
|===============================================================================================================|
| DfltStorPool | k8s-worker-1 | FILE_THIN | | 94.12 GiB | 99.95 GiB | False | Ok |
+---------------------------------------------------------------------------------------------------------------+
| DfltStorPool | k8s-worker-3 | FILE_THIN | | 94.24 GiB | 99.95 GiB | False | Ok |
+---------------------------------------------------------------------------------------------------------------+
k8s-worker-3# du -sh *
2.5M /var/local/DfltStorPool/pvc-57aa6bc2-ec1d-48b9-9ca5-1f135c979794_00000.img
Hello,
Linstor executes the following commands to fetch
- freespace:
df -B 1 --output=avail $FOLDER
- total capacity:
df -B 1 --output=size $FOLDER
# lvcreate --size 100M -n scratch/test -y
Logical volume "test" created.
# mkfs.xfs -m reflink=1 /dev/scratch/test
meta-data=/dev/scratch/test isize=512 agcount=4, agsize=6400 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=0, rmapbt=0, reflink=1
data = bsize=4096 blocks=25600, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=1604, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
# mount /dev/scratch/test /tmp/filethin
# df -B 1 --output=avail,size /tmp/filethin
Avail 1B-blocks
92028928 98287616
# df -H --output=avail,size /tmp/filethin
Avail Size
93M 99M
That means, the difference of 5.something GiB in your example comes most likely from the xfs meta-data, not from the thin file which is only 2.5M small.
It is the linstor volume list
that shows the 100% allocation. And file_thin backend uses loop device instead of lv.
# linstor volume list
+---------------------------------------------------------------------------------------------------------------------------------------------+
| Node | Resource | StoragePool | VolumeNr | MinorNr | DeviceName | Allocated | InUse | State |
|=============================================================================================================================================|
| k8s-worker-1 | pvc-64fc2095-298b-4c5f-a20f-b4b803650336 | DfltStorPool | 0 | 1000 | /dev/drbd1000 | 11.00 GiB | Unused | UpToDate |
+---------------------------------------------------------------------------------------------------------------------------------------------+
Sorry - I misunderstood you.
You are completely right, that was a bug and is fixed now (internally), will be included in the next release.
hi, @ghernadi. v1.4.3 fixed the allocation display for file backend, but seems to bring in a new issue. Here, k8s-worker-3,4,6 are centos, 7,9 are ubuntu.
ERROR:
Description:
Node: 'k8s-worker-3', resource: 'pvc-53dcdb03-04a6-439a-bf71-a630b988e54d', volume: 0 - Device provider threw a storage exception
Details:
Command 'stat -c %B %b /DfltStorPool/pvc-74577f5f-c621-43d6-ad03-b81b14a2f321_00000.img' returned with exitcode 1.
Standard out:
Error message:
stat: cannot stat '/DfltStorPool/pvc-74577f5f-c621-43d6-ad03-b81b14a2f321_00000.img': No such file or directory
ERROR:
Description:
Node: 'k8s-worker-4', resource: 'pvc-53dcdb03-04a6-439a-bf71-a630b988e54d', volume: 0 - Device provider threw a storage exception
Details:
Command 'stat -c %B %b /DfltStorPool/pvc-74577f5f-c621-43d6-ad03-b81b14a2f321_00000.img' returned with exitcode 1.
Standard out:
Error message:
stat: cannot stat '/DfltStorPool/pvc-74577f5f-c621-43d6-ad03-b81b14a2f321_00000.img': No such file or directory
ERROR:
Description:
Node: 'k8s-worker-6', resource: 'pvc-53dcdb03-04a6-439a-bf71-a630b988e54d', volume: 0 - Device provider threw a storage exception
Details:
Command 'stat -c %B %b /DfltStorPool/pvc-74577f5f-c621-43d6-ad03-b81b14a2f321_00000.img' returned with exitcode 1.
Standard out:
Error message:
stat: cannot stat '/DfltStorPool/pvc-74577f5f-c621-43d6-ad03-b81b14a2f321_00000.img': No such file or directory
ERROR:
Description:
Node: 'k8s-worker-7', resource: 'pvc-53dcdb03-04a6-439a-bf71-a630b988e54d', volume: 0 - Device provider threw a storage exception
Details:
Command 'stat -c %B %b /DfltStorPool/pvc-74577f5f-c621-43d6-ad03-b81b14a2f321_00000.img' returned with exitcode 1.
Standard out:
Error message:
stat: cannot stat '/DfltStorPool/pvc-74577f5f-c621-43d6-ad03-b81b14a2f321_00000.img': No such file or directory
ERROR:
Description:
Node: 'k8s-worker-9', resource: 'pvc-53dcdb03-04a6-439a-bf71-a630b988e54d', volume: 0 - Device provider threw a storage exception
Details:
Command 'stat -c %B %b /var/lib/snapd/snaps/core_8592.snap' returned with exitcode 1.
Standard out:
Error message:
stat: cannot stat '/var/lib/snapd/snaps/core_8592.snap': No such file or directory
It seems a directory issue. the pool is at /var/lib/piraeus/storagepool/DfltStorPool
# linstor --no-utf8 sp lp k8s-worker-3 DfltStorPool
+-----------------------------------------------------------------+
| Key | Value |
|=================================================================|
| StorDriver/FileDir | /var/lib/piraeus/storagepools/DfltStorPool |
+-----------------------------------------------------------------+
Also I run the command manually on the host
# stat -c %B %b /var/lib/piraeus/storagepools/DfltStorPool/pvc-74577f5f-c621-43d6-ad03-b81b14a2f321_00000.img
stat: cannot stat '%b': No such file or directory
512
Also, on ubuntu, why stat .snap file?
ERROR:
Description:
Node: 'k8s-worker-9', resource: 'pvc-53dcdb03-04a6-439a-bf71-a630b988e54d', volume: 0 - Device provider threw a storage exception
Details:
Command 'stat -c %B %b /var/lib/snapd/snaps/core_8592.snap' returned with exitcode 1.
Standard out:
Error message:
stat: cannot stat '/var/lib/snapd/snaps/core_8592.snap': No such file or directory
I will look into the wrong directory issue.
Regarding manually executing the command: unfortunately linstor just joins the strings, breaking the previously correct capsulation of the parameters...
stat -c %B %b <dir>
is actually executed as stat -c "%B %b" <dir>
hi @ghernadi As of 1.5.2, this issue still persists.
Hello, sorry for the late response.
I did some quick test, but I cannot reproduce this issue, although I haven't tried it with k8s.
linstor n c bravo
ssh root@bravo lvcreate --size 100M -n scratch/test -y
ssh root@bravo mkfs.xfs -m reflink=1 /dev/scratch/test
ssh root@bravo mkdir -p /tmp/filethin
ssh root@bravo mount /dev/scratch/test /tmp/filethin
linstor n l
linstor sp c filethin bravo sp1 /tmp/filethin/sp1
linstor rd c rsc1 -l storage
linstor vd c rsc1 10m
linstor r c bravo rsc1 -s sp1
ssh root@bravo losetup
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO LOG-SEC
/dev/loop0 0 0 0 0 /tmp/filethin/sp1/rsc1_00000.img 0 512
LINSTOR will basically also do a similar losetup
call (to be more precise, LINSTOR is calling losetup -l -O NAME,BACK-FILE
), iterate the list of BACK-FILE
to determine which LINSTOR-volumes already exist and have a loop-device on top of them.
That is also the source of stat
ing the *.snap
files. I can try to get rid of those unnecessary stat
, but we seem to have a more difficult issue here.
My guess right now is that in your case the output of losetup
actually does not show /var/lib/...
but simply /DfltStorPool/...
. This would at least explain why LINSTOR tries to stat
this file.
The other theoretical possibility is that LINSTOR somehow ignores the base-path (in your case /var/lib/...
) when building the path. Right now I don't think that this is the case as that should also have been triggered by my test - or there are still some circumstances I am missing in my tests...
Anyways, for now I'd like to ask you for some ErrorReports (especially of those failed stat
commands, just to verify the stacktrace), but also if you could get into the satellite-container and manually execute losetup
and see if the paths are fine or not.
If the losetup
command looks fine, I'm afraid I'd need TRACE logs from the satellite. I know this is a bit difficult due to the k8s setup, but for now you'd have to pass --log-level TRACE
when starting the satellite. We are already working on a more convenient way to modify the log level.
hi @ghernadi, 1.5.2 seems unable to create loop device at all.
$ linstor v l
+----------------------------------------------------------------------------------------------------------------------------------------+
| Node | Resource | StoragePool | VolumeNr | MinorNr | DeviceName | Allocated | InUse | State |
|========================================================================================================================================|
| k8s-worker-1 | pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5 | DfltStorPool | 0 | | /dev/loop0 | 4 KiB | | Unknown |
| k8s-worker-2 | pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5 | DfltStorPool | 0 | | /dev/loop0 | 4 KiB | | Unknown |
| k8s-worker-3 | pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5 | DfltStorPool | 0 | | /dev/loop0 | 4 KiB | | Unknown |
| k8s-worker-4 | pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5 | DfltStorPool | 0 | | /dev/loop0 | 4 KiB | | Unknown |
| k8s-worker-5 | pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5 | DfltStorPool | 0 | | /dev/loop0 | 4 KiB | | Unknown |
| k8s-worker-6 | pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5 | DfltStorPool | 0 | | /dev/loop0 | 4 KiB | | Unknown |
| k8s-worker-7 | pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5 | DfltStorPool | 0 | | /dev/loop0 | 4 KiB | | Unknown |
| k8s-worker-8 | pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5 | DfltStorPool | 0 | | None | | | Error |
| k8s-worker-9 | pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5 | DfltStorPool | 0 | | None | | | Error |
+----------------------------------------------------------------------------------------------------------------------------------------+
$ losetup -l /dev/loop0
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE
/dev/loop0 0 0
$ kubectl -n piraeus-system exec -it piraeus-node-zsw78 -- ls -a /var/lib/piraeus/storagepools/DfltStorPool
. ..
satellite trace is attached satellite-debug.txt
@ghernadi Please try recreate the issue using file-thin pool instead of an lvmpool.
In my example above I am manually creating an LV using lvm, this is not done by linstor. For the test I am creating the LV and create an XFS ontop of if, such that the used filesystem is guaranteed to support "snapshots". The example above does use FileThin as a provider, as shown with the command
linstor sp c filethin bravo sp1 /tmp/filethin/sp1
^^^^^^^^
From what I see in the logs you provided, the file-thin pool works as intended. Here are my findings with some comments:
when the controller connects, satellite checks if the filesystem the fileThin storage pool is based on supports snapshots:
08:13:40.311 [MainWorkerPool-1] DEBUG LINSTOR/Satellite - SYSTEM - Executing command: cp --reflink=always /var/lib/piraeus/storagepools/DfltStorPool/LinstorSnapshotTestSource.img /var/lib/piraeus/storagepools/DfltStorPool/LinstorSnapshotTestTarget.img
08:13:40.501 [Thread-28] TRACE LINSTOR/Satellite - SYSTEM - cp: failed to clone '/var/lib/piraeus/storagepools/DfltStorPool/LinstorSnapshotTestTarget.img' from '/var/lib/piraeus/storagepools/DfltStorPool/LinstorSnapshotTestSource.img': Operation not supported
08:13:40.503 [MainWorkerPool-1] TRACE LINSTOR/Satellite - SYSTEM - External command finished in 38ms: cp --reflink=always /var/lib/piraeus/storagepools/DfltStorPool/LinstorSnapshotTestSource.img /var/lib/piraeus/storagepools/DfltStorPool/LinstorSnapshotTestTarget.img
-> no snapshot support - this is fine, as long as you ... well... dont try creating snapshots :)
later when the resource pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5
is being created, satellite checks if the resource already exists:
08:51:22.010 [DeviceManager] DEBUG LINSTOR/Satellite - SYSTEM - Executing command: losetup -l -O NAME,BACK-FILE
08:51:22.408 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - External command finished in 302ms: losetup -l -O NAME,BACK-FILE
-> no resources on the satellite
therefore, create the backing file for the resource
08:51:22.505 [DeviceManager] DEBUG LINSTOR/Satellite - SYSTEM - Executing command: truncate -s 10485760KiB /var/lib/piraeus/storagepools/DfltStorPool/pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5_00000.img
08:51:22.609 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - External command finished in 38ms: truncate -s 10485760KiB /var/lib/piraeus/storagepools/DfltStorPool/pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5_00000.img
and put a loop device on top of it
08:51:22.609 [DeviceManager] DEBUG LINSTOR/Satellite - SYSTEM - Executing command: losetup -f --show /var/lib/piraeus/storagepools/DfltStorPool/pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5_00000.img
08:51:22.706 [Thread-45] TRACE LINSTOR/Satellite - SYSTEM - /dev/loop0
08:51:22.707 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - External command finished in 97ms: losetup -f --show /var/lib/piraeus/storagepools/DfltStorPool/pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5_00000.img
08:51:22.714 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - Waiting until device [/dev/loop0] appears (up to 500ms)
08:51:22.715 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - Device [/dev/loop0] appeared after 6ms
stat the backing file to determine the current allocation size of the thin "volume":
08:51:22.715 [DeviceManager] DEBUG LINSTOR/Satellite - SYSTEM - Executing command: stat -c %B %b /var/lib/piraeus/storagepools/DfltStorPool/pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5_00000.img
08:51:22.874 [Thread-48] TRACE LINSTOR/Satellite - SYSTEM - 512 0
08:51:22.875 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - External command finished in 3ms: stat -c %B %b /var/lib/piraeus/storagepools/DfltStorPool/pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5_00000.img
In the next run of the device manager, Linstor checks if a loop device already exists on top of the desired backing device:
08:51:23.600 [DeviceManager] DEBUG LINSTOR/Satellite - SYSTEM - Executing command: losetup -l -O NAME,BACK-FILE
08:51:23.608 [Thread-62] TRACE LINSTOR/Satellite - SYSTEM - NAME BACK-FILE
08:51:23.608 [Thread-62] TRACE LINSTOR/Satellite - SYSTEM - /dev/loop0 /var/lib/piraeus/storagepools/DfltStorPool/pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5_00000.img
08:51:23.608 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - External command finished in 7ms: losetup -l -O NAME,BACK-FILE
0
-> exists, nothing to do.
This check is repeated several times in different device-manger runs - the satellite always checks if everything is as expected whenever anything changes on ResourceDefinition level. My guess here is that you created the same resource on other nodes - that triggers all satellites to recheck their local resource of the changed resource definition.
That is, until 8:51:37, where you seem to be deleting the resource:
08:51:37.375 [DeviceManager] DEBUG LINSTOR/Satellite - SYSTEM - Executing command: losetup -l -O NAME,BACK-FILE
08:51:37.377 [Thread-164] TRACE LINSTOR/Satellite - SYSTEM - NAME BACK-FILE
08:51:37.377 [Thread-164] TRACE LINSTOR/Satellite - SYSTEM - /dev/loop0 /var/lib/piraeus/storagepools/DfltStorPool/pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5_00000.img
08:51:37.377 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - External command finished in 2ms: losetup -l -O NAME,BACK-FILE
08:51:37.377 [DeviceManager] DEBUG LINSTOR/Satellite - SYSTEM - Executing command: stat -c %B %b /var/lib/piraeus/storagepools/DfltStorPool/pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5_00000.img
08:51:37.379 [Thread-166] TRACE LINSTOR/Satellite - SYSTEM - 512 8
08:51:37.380 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - External command finished in 2ms: stat -c %B %b /var/lib/piraeus/storagepools/DfltStorPool/pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5_00000.img
08:51:37.380 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - Layer 'StorageLayer' finished preparing 1 resources, 0 snapshots
08:51:37.381 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - Layer 'StorageLayer' processing resource 'pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5'
08:51:37.381 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - Lv pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5_00000.img found
08:51:37.381 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - Lv pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5_00000.img will be deleted
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
08:51:37.382 [DeviceManager] DEBUG LINSTOR/Satellite - SYSTEM - Executing command: wipefs -a -f /dev/loop0
08:51:37.473 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - External command finished in 90ms: wipefs -a -f /dev/loop0
08:51:37.473 [DeviceManager] DEBUG LINSTOR/Satellite - SYSTEM - Executing command: losetup -d /dev/loop0
^^^^^^^^^^^^^^^^^^^^^
08:51:37.477 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - External command finished in 2ms: losetup -d /dev/loop0
08:51:37.478 [DeviceManager] TRACE LINSTOR/Satellite - SYSTEM - Layer 'StorageLayer' finished processing resource 'pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5'
Deleting the backing file is done with a Java internal File.delete
command, which we apparently do not log yet.
To summarize - this logs actually shows that the FileThin provider seems to be working correctly, including correct paths from losetup
as backing device...
Which still leaves the question open, why Linstor tried to stat
the wrong backing file as you mentioned in your comment
Command 'stat -c %B %b /DfltStorPool/pvc-74577f5f-c621-43d6-ad03-b81b14a2f321_00000.img' returned with exitcode 1.
On the other hand - that error did not trigger in this run... So there still has to be something different...
Or am I missing something?
The whole process is "k8s dynamic provisioning" via linstor-csi. So something must have told linstor-csi that creation failed.
Besides, even after the log records volume deletion, linstor v l
still shows the resource on the node.
Linstor version 1.4.2 still works with filethin pool and csi, execept for the "Allocated capacity". So what is wrong in 1.5.2 must be things that have been changed since 1.4.2
Besides, even after the log records volume deletion, linstor v l still shows the resource on the node.
I assume the resource is in DELETING
state?
yes. It is.
linstor rd l
+----------------------------------------------------------------------------+
| ResourceName | Port | ResourceGroup | State |
|============================================================================|
| pvc-a1fc73fa-b07a-4de9-8f44-fa5916a28ca5 | | DfltRscGrp | DELETING |
+----------------------------------------------------------------------------+
@ghernadi Is it possible to temporarily roll back the related code to that of v1.4.2? We can try fix the "Allocated capacity" another day since it is a non-blocking issue.
I just want to be sure - the log file you posted here - does that come from a satellite from within a k8s container? or was that satellite manually started without k8s?
Regarding a rollback - you can of course try to go back to version v1.4.2 (or 1.4.3 where the allocated capacity bug was also resolved).
From our side there were exactly 3 commits since 1.4.2 touching the File- or FileThinProvider or the FileCommands class (whereas the latter contains the calls to losetup
and such). Two of the commits were simple refactoring and the third is the fix of the allocated-capacity bug.
Not sure what we could / should rollback on our side...
yes, it is from kubectl log
with no manual intervention.
The problem starts with v1.4.3 and persists to v1.5.2
@alexzhc How are you mapping the directory for the filethin pool? That is, what volume mounts do you have for the node pods?
hi, @JoelColledge
I mapped entire /var/lib/piraeus directory (rw) to the satellite pod, and then mkdir -vp /var/lib/piraeus/storagepools/DfltStorPool
volumeMounts:
- name: var-lib-piraeus
mountPath: /var/lib/piraeus
volumes:
- name: var-lib-piraeus
hostPath:
path: /var/lib/piraeus
The storage class I used is of a single replica. It is works as expected with v1.4.2.
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: piraeus-dflt-raw
provisioner: linstor.csi.linbit.com
allowVolumeExpansion: true
reclaimPolicy: Delete
parameters:
layerlist: storage
placementCount: "1"
placementPolicy: FollowTopology
allowRemoteVolumeAccess: "false"
disklessOnRemaining: "false"
mountOpts: noatime,discard
storagePool: DfltStorPool
In v1.4.3, after the "allocation capacity" issue is fixed, csi keeps creating/deleting resources on each node.
linstor also displays stat
related error messages due to either incomplete path such as stat: cannot stat '/DfltStorPool/pvc-74577f5f-c621-43d6-ad03-b81b14a2f321_00000.img': No such file or directory
or wrong file, such as stat -c %B %b /var/lib/snapd/snaps/core_8592.snap' returned with exitcode 1
My guess is that the stat
error somehow tricked linstor-csi
to believe that resource creation has failed. linstor-csi
then keeps deleting "failed" resource and then retry creating the resource on a new node.
With linstor v1.6.1, layer drbd also reports the same error after a hard reboot.
linstor v l
+----------------------------------------------------------------------------------------------------------------------------------------------+
| Node | Resource | StoragePool | VolumeNr | MinorNr | DeviceName | Allocated | InUse | State |
|==============================================================================================================================================|
| k8s-worker-4 | pvc-1233dc45-f792-48ed-a07d-83d7dba3776b | DfltStorPool | 0 | 1000 | /dev/drbd1000 | 137.41 MiB | Unused | UpToDate |
| k8s-worker-5 | pvc-1233dc45-f792-48ed-a07d-83d7dba3776b | DfltStorPool | 0 | 1000 | /dev/drbd1000 | 137.40 MiB | InUse | UpToDate |
| k8s-worker-6 | pvc-1233dc45-f792-48ed-a07d-83d7dba3776b | DfltStorPool | 0 | 1000 | /dev/drbd1000 | | | Error |
+----------------------------------------------------------------------------------------------------------------------------------------------+
ERROR:
Description:
Node: 'k8s-worker-6', resource: 'pvc-1233dc45-f792-48ed-a07d-83d7dba3776b', volume: 0 - Device provider threw a storage exception
Details:
Command 'stat -c %B %b /storagepools/DfltStorPool/pvc-1233dc45-f792-48ed-a07d-83d7dba3776b_00000.img' returned with exitcode 1.
Standard out:
Error message:
stat: cannot stat '/storagepools/DfltStorPool/pvc-1233dc45-f792-48ed-a07d-83d7dba3776b_00000.img': No such file or directory