cloudstack
cloudstack copied to clipboard
After upgrade from 4.18.0 to 4.18.1 cloudstack-agent not starting
ISSUE TYPE
- Other
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
OS / ENVIRONMENT
SUMMARY
We are trying to upgrade from 4.18.0 to 4.18.1.
We have upgarde ,management node and its up with systemVM version 4.18.1 .
While upgrading hypervisors cloudstack-agent is not starting afetr package upgrade.
Below are logs :-
2024-02-02 14:26:39,507 INFO [cloud.agent.AgentShell] (main:null) (logid:) Implementation Version is 4.18.1.0
2024-02-02 14:26:39,508 INFO [cloud.agent.AgentShell] (main:null) (logid:) agent.properties found at /etc/cloudstack/agent/agent.properties
2024-02-02 14:26:39,546 INFO [cloud.agent.AgentShell] (main:null) (logid:) Defaulting to using properties file for storage
2024-02-02 14:26:39,546 INFO [cloud.agent.AgentShell] (main:null) (logid:) Defaulting to the constant time backoff algorithm
2024-02-02 14:26:39,580 INFO [cloud.utils.LogUtils] (main:null) (logid:) log4j configuration found at /etc/cloudstack/agent/log4j-cloud.xml
2024-02-02 14:26:39,581 INFO [cloud.agent.AgentShell] (main:null) (logid:) Using default Java settings for IPv6 preference for agent connection
2024-02-02 14:26:39,655 INFO [cloud.agent.Agent] (main:null) (logid:) id is 0
2024-02-02 14:26:39,665 ERROR [kvm.resource.LibvirtComputingResource] (main:null) (logid:) uefi properties file not found due to: Unable to find file uefi.properties.
2024-02-02 14:26:39,706 INFO [kvm.resource.LibvirtComputingResource] (main:null) (logid:) Failed to find passphrase for keystore: cloud.jks
2024-02-02 14:26:39,709 INFO [kvm.resource.LibvirtConnection] (main:null) (logid:) No existing libvirtd connection found. Opening a new one
2024-02-02 14:26:39,799 WARN [kvm.resource.LibvirtComputingResource] (main:null) (logid:) Ignoring libvirt error.
org.libvirt.LibvirtException: Network not found: no network with matching name 'default'
at org.libvirt.ErrorHandler.processError(Unknown Source)
at org.libvirt.ErrorHandler.processError(Unknown Source)
at org.libvirt.Connect.networkLookupByName(Unknown Source)
at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.configure(LibvirtComputingResource.java:1081)
at com.cloud.agent.Agent.
We are using kvm native bridge as networking.
On management server we can see error in exception - 2024-02-02 14:46:06,722 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-1175:ctx-9a210df2) (logid:139886e2) Failed to handle host connection: java.lang.IllegalArgumentException: Can't add host: x.x.x.x with hostOS, "Red Hat Enterprise Linux"into a cluster, in which there are "Oracle Linux Server" hosts added.
STEPS TO REPRODUCE
EXPECTED RESULTS
ACTUAL RESULTS
@yashi4engg this is same issue as https://github.com/apache/cloudstack/issues/8026 you may find the workaround in the comments.
@weizhouapache -- we tried workarroun by replace redhat-release content with oracle-release file and now able to add node to cluster ...But somehow now unable to create VM with below error ...even we have enough resources .
2024-02-05 14:44:19,773 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-14:ctx-5789063c job-295587) (logid:5f922a22) Unexpected exception while executing org.apache.cloudstack.api.command.admin.vm.DeployVMCmdByAdmin com.cloud.utils.exception.CloudRuntimeException: Unable to start a VM [5ece1bb3-22c0-4482-86b3-eff04b2b7e38] due to [Unable to create a deployment for VM instance {"id":89770,"instanceName":"xyz-VM","type":"User","uuid":"5ece1bb3-22c0-4482-86b3-eff04b2b7e38"}]. at com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:841) at org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.deployVirtualMachine(VMEntityManagerImpl.java:246) at org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.deploy(VirtualMachineEntityImpl.java:214) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:5401) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:5251) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4876) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4865) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:107) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175) at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:52) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215) at com.sun.proxy.$Proxy185.startVirtualMachine(Unknown Source) at org.apache.cloudstack.api.command.user.vm.DeployVMCmd.execute(DeployVMCmd.java:754) at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:163) at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:112) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:620) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:568) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM instance {"id":89770,"instanceName":"xyz-VM","type":"User","uuid":"5ece1bb3-22c0-4482-86b3-eff04b2b7e38"}Scope=interface com.cloud.dc.DataCenter; id=1 at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1226) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:5412) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ... 18 more 2024-02-05 14:44:19,778 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-14:ctx-5789063c job-295587) (logid:5f922a22) Complete async job-295587, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Unable to start a VM [5ece1bb3-22c0-4482-86b3-eff04b2b7e38] due to [Unable to create a deployment for VM instance {"id":89770,"instanceName":"xyz-VM","type":"User","uuid":"5ece1bb3-22c0-4482-86b3-eff04b2b7e38"}]."}
On hypervisor side we can see below error in agent.logs -
2024-02-05 19:44:07,772 INFO [kvm.resource.LibvirtConnection] (main:null) (logid:) No existing libvirtd connection found. Opening a new one
2024-02-05 19:44:07,886 WARN [kvm.resource.LibvirtComputingResource] (main:null) (logid:) Ignoring libvirt error.
org.libvirt.LibvirtException: Network not found: no network with matching name 'default'
at org.libvirt.ErrorHandler.processError(Unknown Source)
at org.libvirt.ErrorHandler.processError(Unknown Source)
at org.libvirt.Connect.networkLookupByName(Unknown Source)
at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.configure(LibvirtComputingResource.java:1081)
at com.cloud.agent.Agent.
@weizhouapache -- we tried workarroun by replace redhat-release content with oracle-release file and now able to add node to cluster ...But somehow now unable to create VM with below error ...even we have enough resources .
2024-02-05 14:44:19,773 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-14:ctx-5789063c job-295587) (logid:5f922a22) Unexpected exception while executing org.apache.cloudstack.api.command.admin.vm.DeployVMCmdByAdmin com.cloud.utils.exception.CloudRuntimeException: Unable to start a VM [5ece1bb3-22c0-4482-86b3-eff04b2b7e38] due to [Unable to create a deployment for VM instance {"id":89770,"instanceName":"xyz-VM","type":"User","uuid":"5ece1bb3-22c0-4482-86b3-eff04b2b7e38"}]. at com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:841) at org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.deployVirtualMachine(VMEntityManagerImpl.java:246) at org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.deploy(VirtualMachineEntityImpl.java:214) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:5401) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:5251) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4876) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4865) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:107) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175) at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:52) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215) at com.sun.proxy.$Proxy185.startVirtualMachine(Unknown Source) at org.apache.cloudstack.api.command.user.vm.DeployVMCmd.execute(DeployVMCmd.java:754) at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:163) at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:112) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:620) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:568) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM instance {"id":89770,"instanceName":"xyz-VM","type":"User","uuid":"5ece1bb3-22c0-4482-86b3-eff04b2b7e38"}Scope=interface com.cloud.dc.DataCenter; id=1 at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1226) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:5412) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ... 18 more 2024-02-05 14:44:19,778 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-14:ctx-5789063c job-295587) (logid:5f922a22) Complete async job-295587, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Unable to start a VM [5ece1bb3-22c0-4482-86b3-eff04b2b7e38] due to [Unable to create a deployment for VM instance {"id":89770,"instanceName":"xyz-VM","type":"User","uuid":"5ece1bb3-22c0-4482-86b3-eff04b2b7e38"}]."}
@yashi4engg it would be good to share all the logs of the job
We were able to create VMs now and hosts also added back to cloudstack ... But still we had one question in mind.
Is there any change from 4.18.0 to 4.18.1 so it causes that issue where same hypervisors were added to cloudstack without any change in 4.18.0 but as soon as we upgraded 4.18.1 even OS version remained same and no updated in OS files it was unable to add and needed change in host.OS property.
Expected -- It shoul dadded back without any change as it was added earlier with same properties.
We were able to create VMs now and hosts also added back to cloudstack ... But still we had one question in mind.
Is there any change from 4.18.0 to 4.18.1 so it causes that issue where same hypervisors were added to cloudstack without any change in 4.18.0 but as soon as we upgraded 4.18.1 even OS version remained same and no updated in OS files it was unable to add and needed change in host.OS property.
Expected -- It shoul dadded back without any change as it was added earlier with same properties.
I agree with you @yashi4engg
any idea to fix it @DaanHoogland ? This is related to #7570
If I read this correctly the file /etc/redhat-release was editted. this is not the correct procedure. Instead the host details for the hosts in the cluster should be updated. I see this didn t make it into the release notes.
@DaanHoogland -- I agree with you but as a work around we did that. As host.OS propery already showing Oracle in DB but still host was unable to join cluster So we made this change and host was able to join.
You suggest to update host.os property to redhat rather then updating it to release file ?
I would sugest editing the host-detail in the database for the hosts in the cluster to match the contents of the redhat-release file. In that way freshly installed hosts should be able to join the cluster without further manipulation in /etc.
can you share the original contents of /etc/redhat-release and the value that you replaced it with?
cat /etc/redhat-release Red Hat Enterprise Linux release 9.2 (Plow)
cat /etc/oracle-release Oracle Linux Server release 9.2
Now i have fixed the issue after updating Host.OS value in DB and reverted redhat-release contents as those with default installation as above.
Issue is now resolved for us after updating Host.OS value but concern here is it should be not the case general scenario and host should be added by default without any change after upgrade.
@yashi4engg this is an omission in the installation notes. every host el that contains more than one work before "release" in their /etc/redhat-release file, should have that detail updated in the DB. I remember we discussed this, but it slipped through the cracks somehow. cc @shwstppr @mlsorensen @rohityadavcloud I'll start a doc PR for this.
I'll start a doc PR for this.
On second though, I'll first give it some though as to if it can be/should have been automated.
@yashi4engg this is an omission in the installation notes. every host el that contains more than one work before "release" in their /etc/redhat-release file, should have that detail updated in the DB. I remember we discussed this, but it slipped through the cracks somehow. cc @shwstppr @mlsorensen @rohityadavcloud I'll start a doc PR for this.
@DaanHoogland I suggest to add a list of campatible OSes
which includes
- Rocky / Rocky Linux
- Red / Red Hat Enterprise Linux
- AlmaLinux
If we get version from /etc/oracle-release
if it exists, we could add
- Oracle Linux Server
Your PR would solve the issue completely as we can just add strings like "Red" and "Red Hat" in the list.
I checked it in bit details and found file which is responsible for check hypervisor OS version "/usr/share/cloudstack-common/scripts/vm/hypervisor/versions.sh" and according file it first looks on redhat-release and if exist it get details from there.
if [ -f /etc/redhat-release ] ; then get_from_redhat_release if [ -z "$REV" ] && [ -f /etc/os-release ]; then get_from_os_release fi elif [ -f /etc/lsb-release ] ; then get_from_lsb_release elif [ -f /etc/os-release ] ; then get_from_os_release fi
I checked it in bit details and found file which is responsible for check hypervisor OS version "/usr/share/cloudstack-common/scripts/vm/hypervisor/versions.sh" and according file it first looks on redhat-release and if exist it get details from there.
if [ -f /etc/redhat-release ] ; then get_from_redhat_release if [ -z "$REV" ] && [ -f /etc/os-release ]; then get_from_os_release fi elif [ -f /etc/lsb-release ] ; then get_from_lsb_release elif [ -f /etc/os-release ] ; then get_from_os_release fi
yes, this can be improved.
Fixed by https://github.com/apache/cloudstack/pull/8641