cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

live migration of encrypted volume fails with nfs

Open borisstoyanov opened this issue 1 year ago • 7 comments

ISSUE TYPE
  • Bug Report
COMPONENT NAME
API
CLOUDSTACK VERSION
4.18.0, 4.18.1
CONFIGURATION

NFS shared storage

OS / ENVIRONMENT
SUMMARY

When doing migration via 'migrateVMwithVolumes' API I get an exception for missing secret

STEPS TO REPRODUCE
1. deploy a VM 
2. add a data disk which is encrypted
3. migrateVirtualMachineWithVolumes to another storage/host 
4. Observe the error 
EXPECTED RESULTS
migration should pass
ACTUAL RESULTS
fails
2023-11-21 06:35:03,397 INFO  [resource.wrapper.LibvirtMigrateCommandWrapper] (agentRequest-Handler-1:null) (logid:455553da) Migration thread of VM [i-2-3-VM] finished.
2023-11-21 06:35:03,397 DEBUG [agent.properties.AgentPropertiesFileHandler] (agentRequest-Handler-1:null) (logid:455553da) Property [vm.migrate.domain.retrieve.timeout] has empty or null value. Using default value [10].
2023-11-21 06:35:03,398 ERROR [resource.wrapper.LibvirtMigrateCommandWrapper] (agentRequest-Handler-1:null) (logid:455553da) Can't migrate domain [i-2-3-VM] due to: [org.libvirt.LibvirtException: Secret not found: no secret with matching uuid '06292cd0-349c-32d9-b0d4-bfaaf7844efa'].
java.util.concurrent.ExecutionException: org.libvirt.LibvirtException: Secret not found: no secret with matching uuid '06292cd0-349c-32d9-b0d4-bfaaf7844efa'
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
	at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtMigrateCommandWrapper.execute(LibvirtMigrateCommandWrapper.java:296)
	at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtMigrateCommandWrapper.execute(LibvirtMigrateCommandWrapper.java:86)
	at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRequestWrapper.execute(LibvirtRequestWrapper.java:78)
	at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1848)
	at com.cloud.agent.Agent.processRequest(Agent.java:662)
	at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1082)
	at com.cloud.utils.nio.Task.call(Task.java:83)
	at com.cloud.utils.nio.Task.call(Task.java:29)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.libvirt.LibvirtException: Secret not found: no secret with matching uuid '06292cd0-349c-32d9-b0d4-bfaaf7844efa'
	at org.libvirt.ErrorHandler.processError(Unknown Source)
	at org.libvirt.ErrorHandler.processError(Unknown Source)
	at org.libvirt.Domain.migrate(Unknown Source)
	at com.cloud.hypervisor.kvm.resource.MigrateKVMAsync.call(MigrateKVMAsync.java:124)
	at com.cloud.hypervisor.kvm.resource.MigrateKVMAsync.call(MigrateKVMAsync.java:27)
	... 4 more

borisstoyanov avatar Nov 21 '23 06:11 borisstoyanov

Looks like, the same applies for powerflex storage as well, we are not allowing the migration at service layer itself.

if (srcStoragePoolVO.isManaged() && srcStoragePoolVO.getId() != destStoragePoolVO.getId()) {
    throw new CloudRuntimeException("Migrating a volume online with KVM from managed storage is not currently supported.");
}

We can consider this as an enhancement to allow "migrateVMwithVolumes" API to handle volume migration as well for both managed and NFS storages.

harikrishna-patnala avatar Nov 21 '23 07:11 harikrishna-patnala

@borisstoyanov I think 4.18.0 version also has the same issue. Updated the version in description.

harikrishna-patnala avatar Nov 21 '23 07:11 harikrishna-patnala

2023-11-21 06:35:03,397 INFO [resource.wrapper.LibvirtMigrateCommandWrapper] (agentRequest-Handler-1:null) (logid:455553da) Migration thread of VM [i-2-3-VM] finished. 2023-11-21 06:35:03,397 DEBUG [agent.properties.AgentPropertiesFileHandler] (agentRequest-Handler-1:null) (logid:455553da) Property [vm.migrate.domain.retrieve.timeout] has empty or null value. Using default value [10]. 2023-11-21 06:35:03,398 ERROR [resource.wrapper.LibvirtMigrateCommandWrapper] (agentRequest-Handler-1:null) (logid:455553da) Can't migrate domain [i-2-3-VM] due to: [org.libvirt.LibvirtException: Secret not found: no secret with matching uuid '06292cd0-349c-32d9-b0d4-bfaaf7844efa']. java.util.concurrent.ExecutionException: org.libvirt.LibvirtException: Secret not foun

@harikrishna-patnala do we need to set the milestone ?

weizhouapache avatar Nov 21 '23 08:11 weizhouapache

@harikrishna-patnala setting 4.18.2 for now, please update

DaanHoogland avatar Jan 15 '24 10:01 DaanHoogland

This happens when VM is live migrated along with migration of an encrypted data volume to a different pool. If the data volume is not moved to a different pool explicitly, the test case might pass.

StorageSystemDataMotionStrategy.copyAsync()

                if (isNonManagedNfsToNfsOrSharedMountPointToNfs) {
                    migrateDiskInfo = new MigrateCommand.MigrateDiskInfo(srcVolumeInfo.getPath(),
                            MigrateCommand.MigrateDiskInfo.DiskType.FILE,
                            MigrateCommand.MigrateDiskInfo.DriverType.QCOW2,
                            MigrateCommand.MigrateDiskInfo.Source.FILE,
                            connectHostToVolume(destHost, destVolumeInfo.getPoolId(), volumeIdentifier));
                } else {
                    migrateDiskInfo = configureMigrateDiskInfo(srcVolumeInfo, destPath);
                    migrateDiskInfo.setSourceDiskOnStorageFileSystem(isStoragePoolTypeOfFile(sourceStoragePool));
                    migrateDiskInfoList.add(migrateDiskInfo);
                    prepareDiskWithSecretConsumerDetail(vmTO, srcVolumeInfo, destVolumeInfo.getPath());
                }

prepareDiskWithSecretConsumerDetail(vmTO, srcVolumeInfo, destVolumeInfo.getPath()); needs to be called for NonManagedNfs.. also, otherwise the secret on the destination host will be configured with the source volume's path.

The code, it seems like, is present since vol encryption was first introduced in 4.18.0

abh1sar avatar Jun 11 '24 10:06 abh1sar

isn't this considered a serious issue @sureshanaparti ?

rohityadavcloud avatar Jun 29 '24 10:06 rohityadavcloud

isn't this considered a serious issue @sureshanaparti ?

@rohityadavcloud this is there since volume encryption feature (in 4.18.0), seems to be an improvement on top of current volume encryption functionality and it needs proper testing. moved to next milestone for now, any concerns?

sureshanaparti avatar Jun 29 '24 14:06 sureshanaparti