lxd icon indicating copy to clipboard operation
lxd copied to clipboard

lxd/storage/drivers: Consistently set `VolumeMultiNode` (from Incus)

Open kadinsayani opened this issue 6 months ago • 2 comments

Cherry-picked from https://github.com/lxc/incus/pull/1019.

kadinsayani avatar Jun 20 '25 20:06 kadinsayani

I believe this has exposed a bug. Looking into this further.

kadinsayani avatar Jun 20 '25 20:06 kadinsayani

More cherry-picks required, loading...

kadinsayani avatar Jun 20 '25 20:06 kadinsayani

Agreed w/ Tom on VolumeMultiNode/Remote:

driver VolumeMultiNode Remote
zfs N/A false
ceph false true
cephfs true true

Since there's no use-case/reproducer for this in Incus I'd lean towards rejecting these.

MggMuggins avatar Jun 23 '25 13:06 MggMuggins

@tomponline @MggMuggins thanks for your input on this. I'll go ahead and close this pull request and open a separate issue for auditing and reworking logic in VolumeUsedByExclusiveRemoteInstancesWithProfiles and checkBlockVolSharing.

kadinsayani avatar Jun 24 '25 20:06 kadinsayani

@roosterfish did you have a reproducer or info on the issue you faced wrt to something similar to this that you mentioned in Frankfurt?

tomponline avatar Jun 24 '25 21:06 tomponline

Jup I think there is a difference between:

  • the volume can be mapped to two systems at the same time and
  • the volume can be used (read/write) by two systems at the same time (which I think is what VolumeMultiNode is used for)

For example PowerFlex allows mapping the vol on two hosts at the same time but this of course would cause issues when reading/writing from various ends in parallel. Therefore we cannot set VolumeMultiNode=true for PowerFlex. I guess this is also true for all the other remote drivers except CephFS. Probably the security.shared check is consulting the value of the driver's VolumeMultiNode so this PR might have changed the behavior.

@tomponline I hardly remember but it was about the work that Pedro once did regarding blocking the attachment of vols to multiple instances in case they are already used. IIRC the vol's security.shared right now only checks per host (vols on local pools) but in case you want to enable sharing on a remote vol I think security.shared=true doesn't apply and you still cannot attach the same remote vol to multiple instances even if sharing is allowed. In this example I don't know if it makes a difference if the instances are on the same or different cluster members.

I have to test this again to be fully sure this is the current state but it was something around those lines.

roosterfish avatar Jun 25 '25 08:06 roosterfish

For example PowerFlex allows mapping the vol on two hosts at the same time but this of course would cause issues when reading/writing from various ends in parallel. Therefore we cannot set VolumeMultiNode=true for PowerFlex. I guess this is also true for all the other remote drivers except CephFS.

Yes this is true for all remote storage drivers except cephfs which has VolumeMultiNode=true.

Probably the security.shared check is consulting the value of the driver's VolumeMultiNode so this PR might have changed the behavior.

For all remote storage drivers LXD still maps the volume on multiple nodes at the same time to facilitate live migration, irrespective of VolumeMultiNode setting.

The security.shared is related but allows overriding the safety policy, even for non-remote storage drivers, and allow attachment of a block volume to multiple instances concurrently.

So:

Remote indicates if the driver provides remote storage. VolumeMultiNode indicates if its safe to activate and use volumes on multiple members at the same time.

Then on a single member there are further restrictions based on volume type:

Filesystem - this can be mounted on at least 1 member, and shared with multiple instances concurrently. Block - this can be activated on only 1 member, and shared with only 1 instance concurrently.

The security.shared option overrides that last restriction.

But I do think we need to review the relationship between security.shared, volume type and VolumeMultiNode to have one function that checks if its OK to use a volume with an instance.

In the past when we've faced similar refactorings that are partially dependent on volume type/config, we've delegated this out to a new function in the storage driver that takes the volume in question, e.g.

isBlockBacked(vol Volume) bool

So it maybe we actually get ride of VolumeMultiNode entirely, and instead have a function on the storage driver that takes a volume and then it checks if its allowed to be used.

tomponline avatar Jun 25 '25 08:06 tomponline