Re-deploy SystemVMs not working
ISSUE TYPE
- Bug Report
COMPONENT NAME
SystemVM
CLOUDSTACK VERSION
14.19.0.1
CONFIGURATION
OS / ENVIRONMENT
SUMMARY
If I remove System VMs, new VMs don't start
STEPS TO REPRODUCE
Trying to recreate System VMs and get error.
```
2024-07-01 12:41:14,293 WARN [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-38:ctx-d0153ddf job-3/job-75 ctx-37b9fb37) (logid:1974eba6) Unable to orchestrate start VM instance {"id":10,"instanceName":"s-10-VM","type":"SecondaryStorageVm","uuid":"bb6ad9d5-0d11-4897-85d4-3434ac349362"} due to [Create volume from template (ID = 3) failed: Creation of resource definition 'cs-ead4c0e7-cfdf-4ea6-b289-95dcffcf3245' failed due to an unknown exception.].
com.cloud.utils.exception.CloudRuntimeException: Create volume from template (ID = 3) failed: Creation of resource definition 'cs-ead4c0e7-cfdf-4ea6-b289-95dcffcf3245' failed due to an unknown exception.
at org.apache.cloudstack.storage.motion.StorageSystemDataMotionStrategy.handleCreateVolumeFromTemplateBothOnStorageSystem(StorageSystemDataMotionStrategy.java:1499)
at org.apache.cloudstack.storage.motion.StorageSystemDataMotionStrategy.handleCopyAsyncForTemplateAndVolume(StorageSystemDataMotionStrategy.java:396)
at org.apache.cloudstack.storage.motion.StorageSystemDataMotionStrategy.copyAsync(StorageSystemDataMotionStrategy.java:331)
at org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:87)
at org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:116)
at org.apache.cloudstack.storage.volume.VolumeServiceImpl.createVolumeFromBaseImageAsync(VolumeServiceImpl.java:790)
at org.apache.cloudstack.storage.volume.VolumeServiceImpl.createVolumeFromTemplateAsync(VolumeServiceImpl.java:1595)
at org.apache.cloudstack.engine.orchestration.VolumeOrchestrator.recreateVolume(VolumeOrchestrator.java:1799)
at org.apache.cloudstack.engine.orchestration.VolumeOrchestrator.prepare(VolumeOrchestrator.java:1923)
at jdk.internal.reflect.GeneratedMethodAccessor328.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215)
at com.sun.proxy.$Proxy227.prepare(Unknown Source)
at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1268)
at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:5457)
at jdk.internal.reflect.GeneratedMethodAccessor318.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:5581)
at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:654)
at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:602)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
```
EXPECTED RESULTS
New VMs are created and started
ACTUAL RESULTS
New VMs are created but without disks and don't start.
@UAnton , what is your hypervisor type?? and can you find any errors in the host logs ?
@DaanHoogland, I use KVM
Jul 1 11:35:28 node1 java[19561]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-13-VM'
Jul 1 11:35:28 node1 java[19561]: libvirt: LXC Driver error : Domain not found: No domain with matching name 'v-13-VM'
Jul 1 11:35:28 node1 java[19561]: WARN [kvm.resource.LibvirtConnection] (agentRequest-Handler-5:) (logid:2c282009) Can not find a connection for Instance v-13-VM. Assuming the default connection.
Jul 1 11:35:28 node1 java[19561]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-13-VM'
Jul 1 11:35:28 node1 java[19561]: message repeated 4 times: [ libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-13-VM']
Jul 1 11:35:28 node1 java[19561]: WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-5:) (logid:2c282009) Groovy script '/etc/cloudstack/agent/hooks/libvirt-vm-state-change.groovy' is not available. Transformations will not be applied.
Jul 1 11:35:28 node1 java[19561]: WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-5:) (logid:2c282009) Groovy scripting engine is not initialized. Data transformation skipped.
Jul 1 11:35:40 node1 Controller[2227]: 11:35:40.950 [grizzly-http-server-46] INFO LINSTOR/Controller - SYSTEM - REST/API RestClient(10.100.7.2; 'Swagger-Codegen/1.0.0/java')/CloneRscDfn
Jul 1 11:35:40 node1 Controller[2227]: 11:35:40.951 [grizzly-http-server-46] ERROR LINSTOR/Controller - SYSTEM - The creation of a new resource definition failed due to a name collision
Jul 1 11:35:40 node1 Controller[2227]: 11:35:40.962 [grizzly-http-server-47] INFO LINSTOR/Controller - SYSTEM - REST/API RestClient(10.100.7.2; 'Swagger-Codegen/1.0.0/java')/DelRscDfn
Jul 1 11:35:40 node1 Controller[2227]: 11:35:40.962 [grizzly-http-server-47] WARN LINSTOR/Controller - SYSTEM - Resource definition: cs-null not found.
@DaanHoogland, I use KVM
Jul 1 11:35:28 node1 java[19561]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-13-VM' Jul 1 11:35:28 node1 java[19561]: libvirt: LXC Driver error : Domain not found: No domain with matching name 'v-13-VM' Jul 1 11:35:28 node1 java[19561]: WARN [kvm.resource.LibvirtConnection] (agentRequest-Handler-5:) (logid:2c282009) Can not find a connection for Instance v-13-VM. Assuming the default connection. Jul 1 11:35:28 node1 java[19561]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-13-VM' Jul 1 11:35:28 node1 java[19561]: message repeated 4 times: [ libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-13-VM'] Jul 1 11:35:28 node1 java[19561]: WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-5:) (logid:2c282009) Groovy script '/etc/cloudstack/agent/hooks/libvirt-vm-state-change.groovy' is not available. Transformations will not be applied. Jul 1 11:35:28 node1 java[19561]: WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-5:) (logid:2c282009) Groovy scripting engine is not initialized. Data transformation skipped. Jul 1 11:35:40 node1 Controller[2227]: 11:35:40.950 [grizzly-http-server-46] INFO LINSTOR/Controller - SYSTEM - REST/API RestClient(10.100.7.2; 'Swagger-Codegen/1.0.0/java')/CloneRscDfn Jul 1 11:35:40 node1 Controller[2227]: 11:35:40.951 [grizzly-http-server-46] ERROR LINSTOR/Controller - SYSTEM - The creation of a new resource definition failed due to a name collision Jul 1 11:35:40 node1 Controller[2227]: 11:35:40.962 [grizzly-http-server-47] INFO LINSTOR/Controller - SYSTEM - REST/API RestClient(10.100.7.2; 'Swagger-Codegen/1.0.0/java')/DelRscDfn Jul 1 11:35:40 node1 Controller[2227]: 11:35:40.962 [grizzly-http-server-47] WARN LINSTOR/Controller - SYSTEM - Resource definition: cs-null not found.
@rp- , can you explain this log output?
@DaanHoogland, I use KVM
Jul 1 11:35:28 node1 java[19561]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-13-VM' Jul 1 11:35:28 node1 java[19561]: libvirt: LXC Driver error : Domain not found: No domain with matching name 'v-13-VM' Jul 1 11:35:28 node1 java[19561]: WARN [kvm.resource.LibvirtConnection] (agentRequest-Handler-5:) (logid:2c282009) Can not find a connection for Instance v-13-VM. Assuming the default connection. Jul 1 11:35:28 node1 java[19561]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-13-VM' Jul 1 11:35:28 node1 java[19561]: message repeated 4 times: [ libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-13-VM'] Jul 1 11:35:28 node1 java[19561]: WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-5:) (logid:2c282009) Groovy script '/etc/cloudstack/agent/hooks/libvirt-vm-state-change.groovy' is not available. Transformations will not be applied. Jul 1 11:35:28 node1 java[19561]: WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-5:) (logid:2c282009) Groovy scripting engine is not initialized. Data transformation skipped. Jul 1 11:35:40 node1 Controller[2227]: 11:35:40.950 [grizzly-http-server-46] INFO LINSTOR/Controller - SYSTEM - REST/API RestClient(10.100.7.2; 'Swagger-Codegen/1.0.0/java')/CloneRscDfn Jul 1 11:35:40 node1 Controller[2227]: 11:35:40.951 [grizzly-http-server-46] ERROR LINSTOR/Controller - SYSTEM - The creation of a new resource definition failed due to a name collision Jul 1 11:35:40 node1 Controller[2227]: 11:35:40.962 [grizzly-http-server-47] INFO LINSTOR/Controller - SYSTEM - REST/API RestClient(10.100.7.2; 'Swagger-Codegen/1.0.0/java')/DelRscDfn Jul 1 11:35:40 node1 Controller[2227]: 11:35:40.962 [grizzly-http-server-47] WARN LINSTOR/Controller - SYSTEM - Resource definition: cs-null not found.@rp- , can you explain this log output?
"Funny" I just fixed the Jul 1 11:35:40 node1 Controller[2227]: 11:35:40.962 [grizzly-http-server-47] WARN LINSTOR/Controller - SYSTEM - Resource definition: cs-null not found. bug
This happens, while CloudStack tries to delete a volume that wasn't correctly created, but as it didn't get the volume.setPath() called, the deleteAsync tries to delete a resource with the name cs-null.
I'm right now testing the 3 loc fix and will open a PR soon.
The question that remains is, why was the first volume creation not successful.
A workaround meanwhile is to delete the Linstor resource that is trying to be created with the same name. The uuid should be somewhere next to the name collision errors:
linstor rd d cs-UUID
@rp- root@node1:~# linstor rd l
╭────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Port ┊ ResourceGroup ┊ State ┊
╞════════════════════════════════════════════════════════════════════════╡
┊ cs-03ea9c1c-fdf7-41e3-9670-279b1086adbf ┊ 7005 ┊ vdiskrg ┊ ok ┊
┊ cs-9dcca1bc-f31d-4c24-ae2b-dba6d0e0d2ce ┊ 7009 ┊ vdiskrg ┊ ok ┊
┊ cs-94e79c39-01c7-418a-8a58-c38322b98903 ┊ 7004 ┊ vdiskrg ┊ ok ┊
┊ cs-589fab5f-25b1-48ad-8c68-990cbbecc6de ┊ 7002 ┊ vdiskrg ┊ ok ┊
┊ cs-bb428f3a-309e-11ef-8e9c-005056a8c932 ┊ 7000 ┊ vdiskrg ┊ ok ┊
┊ cs-c2ed74b7-371c-4d01-b934-10c464e35647 ┊ 7010 ┊ vdiskrg ┊ ok ┊
┊ cs-c937df7a-0dff-43e7-8635-a1e64eae6415 ┊ 7008 ┊ vdiskrg ┊ ok ┊
┊ cs-cd8eaf3e-f340-4c5e-a06b-163851723b98 ┊ 7001 ┊ vdiskrg ┊ ok ┊
┊ cs-d0f3747e-ab5f-4be6-939d-88214e70d228 ┊ 7006 ┊ vdiskrg ┊ ok ┊
┊ cs-dc078cfc-d07f-4993-a020-bbfcf7a40379 ┊ 7003 ┊ vdiskrg ┊ ok ┊
┊ cs-ead4c0e7-cfdf-4ea6-b289-95dcffcf3245 ┊ 7007 ┊ vdiskrg ┊ ok ┊
I try to delete but this doesn't help. New resources were created after deploying the new VM and got the same error.
Then on the first attempt, there must be a different error than the name collision error. And we would need to fix this.
I will delete all resources from storage and try to redeploy VMs.
Have the SystemVM's worked already? or did they never boot correctly?
@rp- They are not created and boot correctly. If I don't press Rebbot - any resources aren't created in Linstor. This is the log from agent.log after I deleted SystemVM. Cloudstack tries to deploy the new.
2024-07-01 15:57:14,051 INFO [kvm.resource.LibvirtConnection] (agentRequest-Handler-1:null) (logid:1974eba6) No existing libvirtd connection found. Opening a new one
2024-07-01 15:57:14,053 WARN [kvm.resource.LibvirtConnection] (agentRequest-Handler-1:null) (logid:1974eba6) Can not find a connection for Instance s-20-VM. Assuming the default connection.
2024-07-01 15:57:14,161 WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-1:null) (logid:1974eba6) Groovy script '/etc/cloudstack/agent/hooks/libvirt-vm-state-change.groovy' is not available. Transformations will not be applied.
2024-07-01 15:57:14,162 WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-1:null) (logid:1974eba6) Groovy scripting engine is not initialized. Data transformation skipped.
2024-07-01 15:57:14,224 INFO [kvm.resource.LibvirtConnection] (agentRequest-Handler-3:null) (logid:b148d17a) No existing libvirtd connection found. Opening a new one
2024-07-01 15:57:14,225 WARN [kvm.resource.LibvirtConnection] (agentRequest-Handler-3:null) (logid:b148d17a) Can not find a connection for Instance v-21-VM. Assuming the default connection.
2024-07-01 15:57:14,336 WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-3:null) (logid:b148d17a) Groovy script '/etc/cloudstack/agent/hooks/libvirt-vm-state-change.groovy' is not available. Transformations will not be applied.
2024-07-01 15:57:14,336 WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-3:null) (logid:b148d17a) Groovy scripting engine is not initialized. Data transformation skipped.
@rp- They are not created and boot correctly. If I don't press Rebbot - any resources aren't created in Linstor. This is the log from agent.log after I deleted SystemVM. Cloudstack tries to deploy the new.
2024-07-01 15:57:14,051 INFO [kvm.resource.LibvirtConnection] (agentRequest-Handler-1:null) (logid:1974eba6) No existing libvirtd connection found. Opening a new one 2024-07-01 15:57:14,053 WARN [kvm.resource.LibvirtConnection] (agentRequest-Handler-1:null) (logid:1974eba6) Can not find a connection for Instance s-20-VM. Assuming the default connection. 2024-07-01 15:57:14,161 WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-1:null) (logid:1974eba6) Groovy script '/etc/cloudstack/agent/hooks/libvirt-vm-state-change.groovy' is not available. Transformations will not be applied. 2024-07-01 15:57:14,162 WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-1:null) (logid:1974eba6) Groovy scripting engine is not initialized. Data transformation skipped. 2024-07-01 15:57:14,224 INFO [kvm.resource.LibvirtConnection] (agentRequest-Handler-3:null) (logid:b148d17a) No existing libvirtd connection found. Opening a new one 2024-07-01 15:57:14,225 WARN [kvm.resource.LibvirtConnection] (agentRequest-Handler-3:null) (logid:b148d17a) Can not find a connection for Instance v-21-VM. Assuming the default connection. 2024-07-01 15:57:14,336 WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-3:null) (logid:b148d17a) Groovy script '/etc/cloudstack/agent/hooks/libvirt-vm-state-change.groovy' is not available. Transformations will not be applied. 2024-07-01 15:57:14,336 WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-3:null) (logid:b148d17a) Groovy scripting engine is not initialized. Data transformation skipped.
I would be more interested in the management server log and also if the template volume contains a correct systemvm image. Maybe it wasn't copied correctly and is now corrupt.
linstor rd l -s Aux/cs-name Aux/cs-vm-name
helps to find the systemvm image resource name
(v-21-VM) Unable to orchestrate start VM instance {"id":21,"instanceName":"v-21-VM","type":"ConsoleProxy","uuid":"2fd470d1-7a05-42eb-bcda-01d6bb5ba8d3"} due to [Create volume from template (ID = 3) failed: Resource definition 'cs-bb428f3a-309e-11ef-8e9c-005056a8c932' not found.].
(v-21-VM) Unable to orchestrate start VM instance {"id":21,"instanceName":"v-21-VM","type":"ConsoleProxy","uuid":"2fd470d1-7a05-42eb-bcda-01d6bb5ba8d3"} due to [Create volume from template (ID = 3) failed: Resource definition 'cs-bb428f3a-309e-11ef-8e9c-005056a8c932' not found.].
I'm afraid this is now, because you deleted the template copy and CloudStack doesn't know that it isn't there anymore. I think the easier recover here is to delete the current Linstor primary storage and create a new one.
How can I delete storage? I don't see a Delete button.
How can I delete storage? I don't see a Delete button.
You have to put it into maintenance first.
Thanks! Now all ok!
fixed in #9325