piraeus-operator icon indicating copy to clipboard operation
piraeus-operator copied to clipboard

Cannot create PVC when specifying nodeList / clientList in StorageClass (storage pool 'null' for resource)

Open BokuNoGF opened this issue 3 months ago • 5 comments

Hello!

When specifying nodeList / clientList in the StorageClass to pin volumes to specific nodes, any PVC created using the StorageClass fail to initialize, with Linstor reporting the following the error report below stating storage pool 'null' for resource.

If nodeList / clientList is not used, and instead autoPlace is used, the PVC is initialized correctly and the error does not occur.

I'm using the latest released version of the operator (v2.9.0) on Debian 13.1 (trixie), utilizing k3s v1.33.4+k3s1.

Error report:

ERROR REPORT 68C240C8-00000-000004

============================================================

Application:                        LINBIT? LINSTOR
Module:                             Controller
Version:                            1.31.2
Build ID:                           6a9a4bb37f547ff73317585a3efa217523e01f43
Build time:                         2025-06-11T11:52:24+00:00
Error time:                         2025-09-11 03:28:23
Node:                               linstor-controller-6b54c4b495-w2qqj
Thread:                             MainWorkerPool-1
Access context information

Identity:                           PUBLIC
Role:                               PUBLIC
Domain:                             PUBLIC

Peer:                               RestClient(10.42.0.46; 'linstor-csi/v1.8.0-a5640c2f879cf52d15f1826f575d9b5d69fa3d74')

============================================================

Reported error:
===============

Category:                           LinStorException
Class name:                         LinStorException
Class canonical name:               com.linbit.linstor.LinStorException
Generated at:                       Method 'checkStorPoolLoaded', Source file 'CtrlStorPoolResolveHelper.java', Line #335

Error message:                      Dependency not found

Error context:
        The storage pool 'null' for resource 'pvc-a5412897-74d5-4059-9ac4-1f11cf99ba3b' for volume number '0' is not deployed on node 'wow-red'.
ErrorContext:


Call backtrace:

    Method                                   Native Class:Line number
    checkStorPoolLoaded                      N      com.linbit.linstor.CtrlStorPoolResolveHelper:335
    resolveStorPool                          N      com.linbit.linstor.CtrlStorPoolResolveHelper:195
    resolveStorPool                          N      com.linbit.linstor.CtrlStorPoolResolveHelper:74
    createVolumeResolvingStorPool            N      com.linbit.linstor.core.apicallhandler.controller.CtrlVlmCrtApiHelper:88
    createResourceDb                         N      com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiHelper:470
    createResourceInTransaction              N      com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiCallHandler:153
    lambda$createResource$2                  N      com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiCallHandler:122
    doInScope                                N      com.linbit.linstor.core.apicallhandler.ScopeRunner:175
    lambda$fluxInScope$0                     N      com.linbit.linstor.core.apicallhandler.ScopeRunner:100
    call                                     N      reactor.core.publisher.MonoCallable:72
    trySubscribeScalarMap                    N      reactor.core.publisher.FluxFlatMap:127
    subscribeOrReturn                        N      reactor.core.publisher.MonoFlatMapMany:49
    subscribe                                N      reactor.core.publisher.Flux:8759
    onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195
    request                                  N      reactor.core.publisher.Operators$ScalarSubscription:2545
    onSubscribe                              N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141
    subscribe                                N      reactor.core.publisher.MonoJust:55
    subscribe                                N      reactor.core.publisher.MonoDeferContextual:55
    subscribe                                N      reactor.core.publisher.Flux:8773
    onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195
    request                                  N      reactor.core.publisher.Operators$ScalarSubscription:2545
    onSubscribe                              N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141
    subscribe                                N      reactor.core.publisher.MonoJust:55
    subscribe                                N      reactor.core.publisher.MonoDeferContextual:55
    subscribe                                N      reactor.core.publisher.Flux:8773
    onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195
    onNext                                   N      reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber:129
    completePossiblyEmpty                    N      reactor.core.publisher.Operators$BaseFluxToMonoOperator:2071
    onComplete                               N      reactor.core.publisher.MonoCollect$CollectSubscriber:145
    onComplete                               N      reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:549
    onComplete                               N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:260
    checkTerminated                          N      reactor.core.publisher.FluxFlatMap$FlatMapMain:847
    drainLoop                                N      reactor.core.publisher.FluxFlatMap$FlatMapMain:609
    drain                                    N      reactor.core.publisher.FluxFlatMap$FlatMapMain:589
    onComplete                               N      reactor.core.publisher.FluxFlatMap$FlatMapMain:466
    checkTerminated                          N      reactor.core.publisher.FluxFlatMap$FlatMapMain:847
    drainLoop                                N      reactor.core.publisher.FluxFlatMap$FlatMapMain:609
    innerComplete                            N      reactor.core.publisher.FluxFlatMap$FlatMapMain:895
    onComplete                               N      reactor.core.publisher.FluxFlatMap$FlatMapInner:998
    onComplete                               N      reactor.core.publisher.FluxMap$MapSubscriber:144
    onComplete                               N      reactor.core.publisher.Operators$MultiSubscriptionSubscriber:2205
    onComplete                               N      reactor.core.publisher.FluxUsing$UsingSubscriber:236
    onComplete                               N      reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber:85
    complete                                 N      reactor.core.publisher.FluxCreate$BaseSink:460
    drain                                    N      reactor.core.publisher.FluxCreate$BufferAsyncSink:805
    complete                                 N      reactor.core.publisher.FluxCreate$BufferAsyncSink:753
    drainLoop                                N      reactor.core.publisher.FluxCreate$SerializedFluxSink:247
    drain                                    N      reactor.core.publisher.FluxCreate$SerializedFluxSink:213
    complete                                 N      reactor.core.publisher.FluxCreate$SerializedFluxSink:204
    apiCallComplete                          N      com.linbit.linstor.netcom.TcpConnectorPeer:542
    handleComplete                           N      com.linbit.linstor.proto.CommonMessageProcessor:370
    handleDataMessage                        N      com.linbit.linstor.proto.CommonMessageProcessor:300
    doProcessInOrderMessage                  N      com.linbit.linstor.proto.CommonMessageProcessor:245
    lambda$doProcessMessage$4                N      com.linbit.linstor.proto.CommonMessageProcessor:230
    subscribe                                N      reactor.core.publisher.FluxDefer:46
    subscribe                                N      reactor.core.publisher.Flux:8773
    onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:427
    drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:453
    drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:724
    onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:256
    drainFused                               N      reactor.core.publisher.SinkManyUnicast:319
    drain                                    N      reactor.core.publisher.SinkManyUnicast:362
    tryEmitNext                              N      reactor.core.publisher.SinkManyUnicast:237
    tryEmitNext                              N      reactor.core.publisher.SinkManySerialized:100
    processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:448
    doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:228
    lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:165
    onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:185
    runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:440
    run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:527
    call                                     N      reactor.core.scheduler.WorkerTask:84
    call                                     N      reactor.core.scheduler.WorkerTask:37
    run                                      N      java.util.concurrent.FutureTask:264
    run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
    runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1136
    run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:635
    run                                      N      java.lang.Thread:840


END OF ERROR REPORT.

StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: linstor-dpool-xfs-64k-rb
provisioner: linstor.csi.linbit.com
allowVolumeExpansion: true
parameters:
  csi.storage.k8s.io/fstype: xfs
  linstor.csi.linbit.com/fsOpts: "-b size=64k -s size=4k -m rmapbt=1"
  linstor.csi.linbit.com/mountOpts: "logbsize=256k"
  linstor.csi.linbit.com/storagePool: dpool
  linstor.csi.linbit.com/nodeList: "wow-red wow-blue"
  linstor.csi.linbit.com/clientList: "wow-green"
  linstor.csi.linbit.com/placementPolicy: "Manual"
  property.linstor.csi.linbit.com/StorDriver/ZfscreateOptions: "-b 64k"
  property.linstor.csi.linbit.com/DrbdOptions/Net/max-buffers: "40000"
  property.linstor.csi.linbit.com/DrbdOptions/Disk/rs-discard-granularity: "1048576"
  property.linstor.csi.linbit.com/DrbdOptions/Net/protocol: "C"
  property.linstor.csi.linbit.com/DrbdOptions/auto-quorum: suspend-io
  property.linstor.csi.linbit.com/DrbdOptions/Resource/on-no-data-accessible: suspend-io
  property.linstor.csi.linbit.com/DrbdOptions/Resource/on-suspended-primary-outdated: force-secondary
  property.linstor.csi.linbit.com/DrbdOptions/Net/rr-conflict: retry-connect
volumeBindingMode: Immediate
reclaimPolicy: Retain

Node list:

╭──────────────────────────────────────────────────────────╮
┊ Node      ┊ NodeType  ┊ Addresses               ┊ State  ┊
╞══════════════════════════════════════════════════════════╡
┊ wow-blue  ┊ SATELLITE ┊ 192.168.1.52:3367 (SSL) ┊ Online ┊
┊ wow-green ┊ SATELLITE ┊ 192.168.1.51:3367 (SSL) ┊ Online ┊
┊ wow-red   ┊ SATELLITE ┊ 192.168.1.50:3367 (SSL) ┊ Online ┊
╰──────────────────────────────────────────────────────────╯

Storage pool list:

╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node      ┊ Driver   ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName                     ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ wow-blue  ┊ DISKLESS ┊          ┊              ┊               ┊ False        ┊ Ok    ┊ wow-blue;DfltDisklessStorPool  ┊
┊ DfltDisklessStorPool ┊ wow-green ┊ DISKLESS ┊          ┊              ┊               ┊ False        ┊ Ok    ┊ wow-green;DfltDisklessStorPool ┊
┊ DfltDisklessStorPool ┊ wow-red   ┊ DISKLESS ┊          ┊              ┊               ┊ False        ┊ Ok    ┊ wow-red;DfltDisklessStorPool   ┊
┊ dpool                ┊ wow-blue  ┊ ZFS_THIN ┊ dpool    ┊     9.69 TiB ┊     14.50 TiB ┊ True         ┊ Ok    ┊ wow-blue;dpool                 ┊
┊ dpool                ┊ wow-green ┊ ZFS_THIN ┊ dpool    ┊     7.02 TiB ┊     14.50 TiB ┊ True         ┊ Ok    ┊ wow-green;dpool                ┊
┊ dpool                ┊ wow-red   ┊ ZFS_THIN ┊ dpool    ┊     5.27 TiB ┊     14.50 TiB ┊ True         ┊ Ok    ┊ wow-red;dpool                  ┊
┊ rpool                ┊ wow-blue  ┊ ZFS_THIN ┊ rpool    ┊   196.27 GiB ┊       294 GiB ┊ True         ┊ Ok    ┊ wow-blue;rpool                 ┊
┊ rpool                ┊ wow-green ┊ ZFS_THIN ┊ rpool    ┊   229.01 GiB ┊       294 GiB ┊ True         ┊ Ok    ┊ wow-green;rpool                ┊
┊ rpool                ┊ wow-red   ┊ ZFS_THIN ┊ rpool    ┊   170.57 GiB ┊       294 GiB ┊ True         ┊ Ok    ┊ wow-red;rpool                  ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

BokuNoGF avatar Sep 11 '25 03:09 BokuNoGF

I don't think anyone tested these parameters in the last few years.

You can achieve what you want by setting the StorageClass up with allowedTopology which selects wow-blue and wow-red. Then you can set up allowVolumeRemoteAccess: false, so the PV will have a strict affinity set.

WanzenBug avatar Sep 11 '25 07:09 WanzenBug

I don't think anyone tested these parameters in the last few years.

You can achieve what you want by setting the StorageClass up with allowedTopology which selects wow-blue and wow-red. Then you can set up allowVolumeRemoteAccess: false, so the PV will have a strict affinity set.

Gotcha, okay, thanks for the info!

Sadly, I can't seem to find any info on an allowedTopology key for StorageClasses, either in general or in the piraues repos.

Do you mean trying out setting nodeAffinity or topologySpreadConstraints on the Deployment and setting up volumeBindingMode for the StorageClass with WaitForFirstConsumer, and that this will cause the operator to ensure the volumes are created only on the nodes that the nodeAffinity is set for?

BokuNoGF avatar Sep 12 '25 06:09 BokuNoGF

Sadly, I can't seem to find any info on an allowedTopology key for StorageClasses, either in general or in the piraues repos.

https://kubernetes.io/docs/concepts/storage/storage-classes/#allowed-topologies

Basically, your storage class should look something like:

volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
  - key: kubernetes.io/hostname
    values:
    - wow-red
    - wow-blue
parameters:
  allowRemoteVolumeAccess: "false"

WanzenBug avatar Sep 12 '25 06:09 WanzenBug

Sadly, I can't seem to find any info on an allowedTopology key for StorageClasses, either in general or in the piraues repos.

https://kubernetes.io/docs/concepts/storage/storage-classes/#allowed-topologies

Basically, your storage class should look something like:

volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
  - key: kubernetes.io/hostname
    values:
    - wow-red
    - wow-blue
parameters:
  allowRemoteVolumeAccess: "false"

Hey! Sorry for the delay, and thanks for the info, not sure how my Googling missed that one...

Did some testing, and sadly it seems the allowedTopologies parameter is not applied consistently, resulting in the volume replicas ending up on unexpected nodes at times. Below is an example.

Is there anything else I can test out to try and make sure volumes are allocated on the appropriate nodes?

Storage class:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: linstor-dpool-xfs-64k-rg
provisioner: linstor.csi.linbit.com
allowVolumeExpansion: true
allowedTopologies:
- matchLabelExpressions:
  - key: kubernetes.io/hostname
    values:
    - wow-red
    - wow-green
parameters:
  csi.storage.k8s.io/fstype: xfs
  linstor.csi.linbit.com/fsOpts: "-b size=64k -s size=4k -m rmapbt=1"
  linstor.csi.linbit.com/mountOpts: "logbsize=256k"
  linstor.csi.linbit.com/storagePool: dpool
  linstor.csi.linbit.com/autoPlace: "2"
  linstor.csi.linbit.com/allowRemoteVolumeAccess: "false"
  property.linstor.csi.linbit.com/StorDriver/ZfscreateOptions: "-b 64k"
  property.linstor.csi.linbit.com/DrbdOptions/Net/max-buffers: "40000"
  property.linstor.csi.linbit.com/DrbdOptions/Net/max-epoch-size: "10000"
  property.linstor.csi.linbit.com/DrbdOptions/Net/sndbuf-size: "10485760"
  property.linstor.csi.linbit.com/DrbdOptions/Net/rcvbuf-size: "10485760"
  property.linstor.csi.linbit.com/DrbdOptions/Disk/rs-discard-granularity: "1048576"
  property.linstor.csi.linbit.com/DrbdOptions/Net/protocol: "C"
  property.linstor.csi.linbit.com/DrbdOptions/auto-quorum: suspend-io
  property.linstor.csi.linbit.com/DrbdOptions/Resource/on-no-data-accessible: suspend-io
  property.linstor.csi.linbit.com/DrbdOptions/Resource/on-suspended-primary-outdated: force-secondary
  property.linstor.csi.linbit.com/DrbdOptions/Net/rr-conflict: retry-connect
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain

PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: time-machine-dpool-64k-rg
  namespace: time-machine
spec:
  storageClassName: linstor-dpool-xfs-64k-rg
  accessModes: [ "ReadWriteOnce" ]
  resources:
    requests:
      storage: 2Ti

DRBD Volumes:

davs@wow-red ~> kubectl linstor v list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Resource                                 ┊ Node      ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊      State ┊ Repl           ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-d1eeb3a0-cb67-441f-9106-3125cf3bf5d0 ┊ wow-blue  ┊ dpool                ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊ 13.64 GiB ┊ Unused ┊   UpToDate ┊ Established(2) ┊
┊ pvc-d1eeb3a0-cb67-441f-9106-3125cf3bf5d0 ┊ wow-green ┊ dpool                ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊ 13.75 GiB ┊ InUse  ┊   UpToDate ┊ Established(2) ┊
┊ pvc-d1eeb3a0-cb67-441f-9106-3125cf3bf5d0 ┊ wow-red   ┊ DfltDisklessStorPool ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊           ┊ Unused ┊ TieBreaker ┊ Established(2) ┊

BokuNoGF avatar Sep 14 '25 03:09 BokuNoGF

Quick update: was able to kind of work around this by doing the following:

  1. Tag each node pair with the same aux property:
kubectl linstor n set-property wow-red "Aux/red-green-group" "true"
kubectl linstor n set-property wow-green "Aux/red-green-group" "true"

kubectl linstor n set-property wow-red "Aux/red-blue-group" "true"
kubectl linstor n set-property wow-blue "Aux/red-blue-group" "true"

kubectl linstor n set-property wow-green "Aux/green-blue-group" "true"
kubectl linstor n set-property wow-blue "Aux/green-blue-group" "true"
  1. Use the replicasOnSame property to lock the created volumes from the storage class to the nodes with the given aux property:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: linstor-dpool-64k-rg
provisioner: linstor.csi.linbit.com
allowVolumeExpansion: true
parameters:
  ...
  linstor.csi.linbit.com/replicasOnSame: "Aux/red-green-group=true"
  ...
  1. Manually create a diskless resource on the remaining node to enable quorum (disklessOnRemaning doesn't seem to work when using replicasOnSame nor is a tiebreaker created automatically):
kubectl linstor r create --diskless wow-blue pvc-91d97812-0c9b-414b-a01b-51316849e2de

BokuNoGF avatar Sep 26 '25 10:09 BokuNoGF