cloudstack Degraded cloudstack agent

problem

Over time, the CloudStack agent becomes increasingly slow when starting or stopping virtual machines. This performance degradation is especially noticeable during bulk VM operations, where the delays can become significant. The only effective workaround I've found so far is to restart the CloudStack agent service, which temporarily restores normal performance.

versions

the version is 4.20.1 Ubuntu 24 kvm

The steps to reproduce the bug

What to do about it?

Jul 03 '25 21:07 poli1025

Thanks for opening your first issue here! Be sure to follow the issue template!

Jul 03 '25 21:07 boring-cyborg[bot]

I have a similar problem, with other symptoms as well. For example, alerts in Cloudstack like "Health checks failed" for virtual routers on affected hosts. The agent is constantly consuming 100% CPU, even when there are no jobs or any current actions on that host.

Jul 16 '25 08:07 marekhorecny

@poli1025 is degradation specific only for the agent or is there overall slowness in handling start-stop API calls? (So we can rule out any MS issue) You mention bulk operations. Can you please give some insight on that? In the agent logs, do you see any related logs, like some stuck task of a background job giving an error? Would ut be possible for you provide logs and may be heap dump to analyze this?

Jul 16 '25 12:07 shwstppr

We’ve identified that the issue is specifically related to the agent — there is no noticeable delay in the API calls. We haven’t found any errors in the logs, but we are observing that the agent is progressively taking longer to perform the required actions.

The bulk operations are being executed via Terraform, such as the creation of 50 machines. The next day, when we destroy and recreate them, the process takes significantly more time. However, if we restart the agent, the performance returns to normal. Over time, the degradation happens again, and the creation time can eventually triple.

2025-07-28 22:21:59,724 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-51:[]) (logid:) Trying to fetch storage pool 0f79bc6b-d365-3005-a4b7-8a41d336c6c6 from libvirt 2025-07-28 22:21:59,726 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-51:[]) (logid:) Asking libvirt to refresh storage pool 0f79bc6b-d365-3005-a4b7-8a41d336c6c6 2025-07-28 22:23:00,459 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-62:[]) (logid:) Trying to fetch storage pool 0f79bc6b-d365-3005-a4b7-8a41d336c6c6 from libvirt 2025-07-28 22:23:00,460 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-62:[]) (logid:) Asking libvirt to refresh storage pool 0f79bc6b-d365-3005-a4b7-8a41d336c6c6 2025-07-28 22:23:00,863 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-60:[]) (logid:) Trying to fetch storage pool 00b05d76-e83c-377e-9005-32f9a1d18679 from libvirt 2025-07-28 22:23:00,864 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-60:[]) (logid:) Asking libvirt to refresh storage pool 00b05d76-e83c-377e-9005-32f9a1d18679 2025-07-28 22:23:00,918 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-65:[]) (logid:) Trying to fetch storage pool e65cc751-30c7-3404-8932-d9d0c1577411 from libvirt 2025-07-28 22:23:00,919 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-65:[]) (logid:) Asking libvirt to refresh storage pool e65cc751-30c7-3404-8932-d9d0c1577411 2025-07-28 22:23:01,420 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-64:[]) (logid:) Trying to fetch storage pool 6aa94ea7-c6a7-344c-a11f-9a00654e1c23 from libvirt 2025-07-28 22:23:01,421 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-64:[]) (logid:) Asking libvirt to refresh storage pool 6aa94ea7-c6a7-344c-a11f-9a00654e1c23 2025-07-28 22:23:01,824 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-67:[]) (logid:) Trying to fetch storage pool 149ab1fd-3757-3bde-b359-60f522b8a17c from libvirt 2025-07-28 22:23:01,825 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-67:[]) (logid:) Asking libvirt to refresh storage pool 149ab1fd-3757-3bde-b359-60f522b8a17c 2025-07-28 22:24:01,665 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-55:[]) (logid:) Trying to fetch storage pool 00b05d76-e83c-377e-9005-32f9a1d18679 from libvirt 2025-07-28 22:24:01,667 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-55:[]) (logid:) Asking libvirt to refresh storage pool 00b05d76-e83c-377e-9005-32f9a1d18679 2025-07-28 22:24:02,026 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-68:[]) (logid:) Trying to fetch storage pool 0f79bc6b-d365-3005-a4b7-8a41d336c6c6 from libvirt 2025-07-28 22:24:02,028 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-68:[]) (logid:) Asking libvirt to refresh storage pool 0f79bc6b-d365-3005-a4b7-8a41d336c6c6 2025-07-28 22:24:02,092 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-75:[]) (logid:) Trying to fetch storage pool 5d8d05bd-451b-33be-ab65-b9d8e3f02528 from libvirt 2025-07-28 22:24:02,094 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-75:[]) (logid:) Asking libvirt to refresh storage pool 5d8d05bd-451b-33be-ab65-b9d8e3f02528 2025-07-28 22:24:02,205 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-61:[]) (logid:) Trying to fetch storage pool 6aa94ea7-c6a7-344c-a11f-9a00654e1c23 from libvirt 2025-07-28 22:24:02,207 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-61:[]) (logid:) Asking libvirt to refresh storage pool 6aa94ea7-c6a7-344c-a11f-9a00654e1c23 2025-07-28 22:24:02,312 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-70:[]) (logid:) Trying to fetch storage pool cacc288a-c318-3b0e-858d-1293480acec6 from libvirt 2025-07-28 22:24:02,313 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-70:[]) (logid:) Asking libvirt to refresh storage pool cacc288a-c318-3b0e-858d-1293480acec6 2025-07-28 22:24:02,492 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-72:[]) (logid:) Trying to fetch storage pool 00b05d76-e83c-377e-9005-32f9a1d18679 from libvirt 2025-07-28 22:24:02,493 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-72:[]) (logid:) Asking libvirt to refresh storage pool 00b05d76-e83c-377e-9005-32f9a1d18679 2025-07-28 22:24:02,543 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-73:[]) (logid:) Trying to fetch storage pool e65cc751-30c7-3404-8932-d9d0c1577411 from libvirt 2025-07-28 22:24:02,544 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-73:[]) (logid:) Asking libvirt to refresh storage pool e65cc751-30c7-3404-8932-d9d0c1577411 2025-07-28 22:25:02,455 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-83:[]) (logid:) Trying to fetch storage pool dec8139e-7163-3ea4-b488-cfce1f138e58 from libvirt 2025-07-28 22:25:02,456 INFO [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-83:[]) (logid:) Asking libvirt to refresh storage pool dec8139e-7163-3ea4-b488-cfce1f138e58

Jul 28 '25 20:07 poli1025

I created 50 vm instances in parallel, the deployments and agents worked well

need more testing and investigation

Sep 01 '25 06:09 weizhouapache

@weizhouapache The initial deployments and agents work well. However, after a few days we observe degradation: VM creation, power cycling, and console access start to slow down.

Sep 01 '25 13:09 poli1025

The problem seems to be a leak in the handler threads while checking storage usage. The more agent threads you configure in agent.properties and the less time you configure to retrieve volume usage metrics, the worse it gets, and the faster it happens.

On a fresh start of the agent, you get the 'Trying to fetch storage pool xxxx from libvirt' message whenever the usage service is getting updated metrics. Those requests are either leaking or not getting garbage collected or something like that in time. Those requests start to overlap with time, and you end up seeing the same request to the same primary storage tens or hundreds of times. The only way to recover from that is to restart the agent, limit the number of threads of the agent and try to read the usage metrics in longer time spans (I think it defaults to 10 minutes or something like that, setting it to once every two hours mitigates it a bit, just enough so you don't have to restart the agent every few hours so it doesn't hog the kvm node cpu).

Here's a log with redacted storage uuids so it's easier to see (take a look at the timestamps) storage-log-no-uuids.log

This happens at least since Cloudstack 4.19

Sep 15 '25 15:09 vgarcia-linube

thanks for the information @vgarcia-linube

Sep 16 '25 13:09 weizhouapache

we also have this kind of issue. After a few days the agent stops executing tasks. After restarting the agent it starts working again.

ACS currently 4.20.1.0, upgrade to 4.20.2.0 is in planning
Ubuntu 24.04
Primary Storage is provided by a Netapp NFS Share
20-30 vms per host currently

Is there any useful debugging data i can collect when this error occurs?

Nov 07 '25 09:11 jgotteswinter

I wonder if this is the same issue I've been struggling with. That said, it seems at least in my case just restarting the agent isn't always enough, something gets 'hung up', like a lock, when the agent tries to connect to the management nodes which means sometimes I need to restart the management nodes as well.

Nov 07 '25 10:11 bradh352

I also had situations where the agent was restarted, up and running (at least from what i could see in the logs everything was fine). But it took several minutes for the agent to reconnect to the management servers. Usually this happens instantly.

Nov 07 '25 11:11 jgotteswinter

@jgotteswinter We've seen that too but it's usually pressure on other parts of the stack. For example, if the primary storage server is slow to respond, restarting the cloudstack-agent may need more time to initialize as it loops through the storage domains.

On this specific issue, we've reconfigured the agent to set its logs to debug but we didn't find any more clues than the logs we shared here

With about 500 instances in our clusters, we're seeing this event once every two days approximately, but time varies on node capability and number of machines concurrently running on that specific node

Nov 07 '25 11:11 vgarcia-linube

@vgarcia-linube on the settings that helped you said:

limit the number of threads of the agent and try to read the usage metrics in longer time spans (I think it defaults to 10 minutes or something like that, setting it to once every two hours mitigates it a bit, just enough so you don't have to restart the agent every few hours so it doesn't hog the kvm node cpu

Can you provide the exact setting names (and where, such as agent.properties or something in management config) and values you used that helped? I'd like to see if that helps me.

Also does preemptively restarting the agent before it gets too bad in something like cron.daily prevent the issue from occurring?

Nov 07 '25 11:11 bhouse-nexthop

@bhouse-nexthop i configured a daily cron which restarts the agent every night a few days ago. So far, it did not show about again. @vgarcia-linube i also enabled debug on one of my compute nodes, with same result.

i was thinking about configuring jmx, maybe this could give more information?

JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote=true" JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote.port=3344" # Choose an unused port, e.g., 3344 JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote.ssl=false" # Enable SSL in production JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote.authenticate=false" # Add auth in production

Nov 07 '25 11:11 jgotteswinter

we also have this kind of issue. After a few days the agent stops executing tasks. After restarting the agent it starts working again.

ACS currently 4.20.1.0, upgrade to 4.20.2.0 is in planning

Ubuntu 24.04

Primary Storage is provided by a Netapp NFS Share

20-30 vms per host currently

Is there any useful debugging data i can collect when this error occurs?

@jgotteswinter can you share the number of storage pools on the hosts ?

Nov 07 '25 12:11 weizhouapache

@vgarcia-linube on the settings that helped you said:

limit the number of threads of the agent and try to read the usage metrics in longer time spans (I think it defaults to 10 minutes or something like that, setting it to once every two hours mitigates it a bit, just enough so you don't have to restart the agent every few hours so it doesn't hog the kvm node cpu

Can you provide the exact setting names (and where, such as agent.properties or something in management config) and values you used that helped? I'd like to see if that helps me.

Also does preemptively restarting the agent before it gets too bad in something like cron.daily prevent the issue from occurring?

could check stats.interval in global settings

Nov 07 '25 12:11 weizhouapache

@weizhouapache there is currently only one primary storage and one secondary. Both NFS

Nov 07 '25 12:11 jgotteswinter

I also see this issue frequently, day to day vm operation including starting, stopping, getting console, migrating seems to be affected by it and this happens only on specific compute node while the others work without similar issue. For me most of the time i don't even need to restart the cloudstack-agent, just force reconnecting the host from cloudstack management ui resolves the issue. But i haven't found a way to properly replicate/recreate the issue from my side.

Nov 07 '25 17:11 TadiosAbebe

@TadiosAbebe thats an interesting observation, i will try that.

Some of our instances run k8s with the leaseweb csi driver, we had deployments which triggered a lot of block volume actions. I think this frequently caused this problem. Meanwhile we got rid of these deployments, the problem still shows up but less frequent.

Nov 08 '25 06:11 jgotteswinter

That's an interesting angle on this issue. I've got a test cluster, but it has more nodes (8 -- its an old supermicro microcloud 3U) and fewer vms, but I do NOT run k8s on that and its run for many weeks without seeing this issue (I've never seen it on this cluster).

But on another cluster that is much newer and more powerful but only has 3 nodes, is fairly heavily loaded with about 30 vms per node, but also runs k8s, I see this issue every 2 days ...

I hadn't thought to look at the k8s angle, but it is an additional difference over the load aspect. I wonder if I can reproduce by deploying a k8s cluster in the test environment.

Since someone else asked about the primary and secondary storage, I use Ceph RBD for primary and NFS (via Ganesha NFS on top of CephFS) for secondary storage ... its set up the same on both the test and prod clusters.

Nov 08 '25 15:11 bhouse-nexthop

@vgarcia-linube on the settings that helped you said:

limit the number of threads of the agent and try to read the usage metrics in longer time spans (I think it defaults to 10 minutes or something like that, setting it to once every two hours mitigates it a bit, just enough so you don't have to restart the agent every few hours so it doesn't hog the kvm node cpu

Can you provide the exact setting names (and where, such as agent.properties or something in management config) and values you used that helped? I'd like to see if that helps me. Also does preemptively restarting the agent before it gets too bad in something like cron.daily prevent the issue from occurring?

could check stats.interval in global settings

@vgarcia-linube can you elaborate on which stats you changed? There are a lot of stats.interval settings in the global config, so I'm thinking you didnt' change all of those did you? "Usage metrics" is a bit generic to know what you are referring to. You did mention ones that default to 10 minute time spans, the only one I see that matches that is volume.stats.interval.

Nov 08 '25 15:11 bhouse-nexthop

The ones that seemed to do the trick were: In cloudstack agent config: /etc/cloudstack/agent/agent.properties: workers -> 3 In global config in Cloudstack itself: storage.stats.interval: 120000

The more workers configured in cloudstack-agent, the faster the error showed up. Our clusters handle about ~600 vms, currently a mix of standalone instances and kubernetes nodes, but this happened long before we had any kubernetes cluster running

Nov 10 '25 06:11 vgarcia-linube

just as a note about my k8s statement. I dont think its k8s itself, the cloudstack csi driver (leaseweb one, in my case) is suspicious to me.

Nov 10 '25 07:11 jgotteswinter

I am beginning to doubt whether this issue is related to the CloudStack agent. Over the past few weeks, I’ve been looking more closely at the problem. As I mentioned earlier, I encounter it fairly frequently, and it affects basic VM operations such as creating, starting, stopping, and accessing the console.

It turns out my earlier assumption was incorrect and force reconnecting the compute host does not consistently resolve the issue. This time, instead of reconnecting the host or restarting the cloudstack-agent service, I tried restarting the libvirtd service whenever i encounter the issue. I tested this several times, and in every case, simply restarting libvirtd (without restarting CloudStack agent) resolved the issue.

I haven't seen any noticeable differences in the libvirtd logs compared to a working host, so it’s difficult to pinpoint the root cause. Since this doesn’t appear to be a widely reported problem, I suspect it may be environmental. All my compute hosts are running Ubuntu Server 24.04, and the environment uses a hyperconverged Ceph setup.

Nov 23 '25 21:11 TadiosAbebe

@TadiosAbebe my environment matches yours .... ubuntu 24.04, hypercoverged ceph. And the reported behavior also matches.

That said, using the force reconnect never did work for me as a resolution. And when a host got in that state, just restarting the agents wouldn't work for me and I'm not exactly sure why. I never did fully isolate it. Sometimes I had to restart the management servers which was really concerning, I think I tried restarting libvirtd too.

I set up a cron job to restart the agent daily at 5 min intervals across hosts: https://github.com/bradh352/ansible-role-service-cloudstack/commit/fdb14d92f115a5845d95880a05f467b42a84aff1

This has helped considerably. That said, I just went through to "test" everything right now to see if everything was healthy, which entails trying to open the console on one vm per host, and I found a host in a bad state ... ugh. So restarting the agent daily is definitely not a resolution. Out of curiosity I decided to restart libvirtd, and that did fix it on the bad host.

Nov 23 '25 22:11 bradh352

@bradh352 , opening the console is also my first step when checking whether a compute host is experiencing issues. Over the past few weeks, I ran several tests across my five compute hosts (all using Ceph) to better understand the behavior.

Initial Tests

These tests were performed a few days after restarting both cloudstack-agent and libvirtd.

host	vm_create(in sec)	vm_stop(in sec)	vm_start(in sec)	vm_console(in sec)
host1	86	30	60	5
host2	37	18	28	4
host3	54	9	33	1
host4	49	3	19	0
host5	12	3	19	0

After Restarting libvirtd

I restarted libvirtd on all hosts and repeated the same tests.

host	vm_create(in sec)	vm_stop(in sec)	vm_start(in sec)
host1	21	3	19
host2	22	3	19
host3	25	3	19
host4	21	7	19
host5	22	3	18

Same Tests Again (After ~2 Days)

host	vm_create(in sec)	vm_stop(in sec)	vm_start(in sec)	vm_console(in sec)
host1	31	9	21	1
host2	28	3	18	0
host3	28	6	23	1
host4	30	3	19	0
host5	26	3	18	0

These results may not be perfectly accurate due to different factors, but the performance differences are still notable especially the improvement immediately after restarting libvirtd.

My main concern with scheduling an automated restart of cloudstack-agent was, if it restarts in the middle of an operation, it may leave behind orphaned resources. I believe this is how I ended up with volumes stuck in Destroy state and a virtual router stuck in Expunging, which I’m still unable to fully remove.

It would be helpful to know whether restarting libvirtd is truly the reliable fix for you too. Even if that’s the case, I still don’t understand why this degradation happens over time.

Nov 24 '25 07:11 TadiosAbebe

I was able to collect a java thread dump while the error happened, no idea if this is helpful. Can anyone give a hint what kind of debug information would be helpful?

agent-stuck.txt

Nov 24 '25 07:11 jgotteswinter

thanks for the information @bradh352 @TadiosAbebe @jgotteswinter

I suspected it is due to that all libvirt operations use the same instance of LibvirtConnection. a libvirt operation might delay other libvirt operations. we could introduce a thread pool of LibvirtConnection if it supports concurrent operations.

however, I do not know why the issue became very often in recent releases. can everyone share the below ?

global configurations (cmk list configurations filter=id,name,value keyword=stats.interval)
java version of kvm host (java --version)
OS distribution of kvm host (cat /etc/os-release)
libvirt and qemu version of kvm host (virsh version)
Approx. number of VMs on kvm host

Nov 24 '25 08:11 weizhouapache

{
  "configuration": [
    {
      "name": "autoscale.stats.interval",
      "value": "60"
    },
    {
      "name": "cluster.heartbeat.threshold",
      "value": "150000"
    },
    {
      "name": "database.server.stats.interval",
      "value": "60"
    },
    {
      "name": "database.server.stats.retention",
      "value": "3"
    },
    {
      "name": "direct.network.stats.interval",
      "value": "86400"
    },
    {
      "name": "external.network.stats.interval",
      "value": "300"
    },
    {
      "name": "host.stats.interval",
      "value": "60000"
    },
    {
      "name": "management.server.stats.interval",
      "value": "60"
    },
    {
      "name": "router.stats.interval",
      "value": "300"
    },
    {
      "name": "storage.stats.interval",
      "value": "60000"
    },
    {
      "name": "storpool.storage.stats.interval",
      "value": "3600"
    },
    {
      "name": "storpool.volumes.stats.interval",
      "value": "3600"
    },
    {
      "name": "vm.disk.stats.interval",
      "value": "0"
    },
    {
      "name": "vm.disk.stats.interval.min",
      "value": "300"
    },
    {
      "name": "vm.network.stats.interval",
      "value": "0"
    },
    {
      "name": "vm.network.stats.interval.min",
      "value": "300"
    },
    {
      "name": "vm.stats.interval",
      "value": "60000"
    },
    {
      "name": "volume.stats.interval",
      "value": "600000"
    }
  ],
  "count": 18
}

xx@xxx:~# java --version
openjdk 17.0.16 2025-07-15
OpenJDK Runtime Environment (build 17.0.16+8-Ubuntu-0ubuntu124.04.1)
OpenJDK 64-Bit Server VM (build 17.0.16+8-Ubuntu-0ubuntu124.04.1, mixed mode, sharing)

xxx@xxx:~# cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.3 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo


xxx@xxx:~# virsh version
Compiled against library: libvirt 10.0.0
Using library: libvirt 10.0.0
Using API: QEMU 10.0.0
Running hypervisor: QEMU 8.2.2```

xxx@xxx:~# ps auxf |grep qemu-system |wc -l
25

This is our development environment, there is not so much activity.

Nov 24 '25 09:11 jgotteswinter

thanks @jgotteswinter cc @weizhouapache I did take a quick look at the thread dump you shared. I'm seeing some hot/suspected threads (Roughly 2.1% of one core on average over the entire uptime),

"AgentOutRequest-Handler-2" #88 prio=5 os_prio=0 cpu=20647388.02ms elapsed=949378.81s tid=0x00007cc18814b210 nid=0xe5e2d runnable  [0x00007cc0131f0000]
   java.lang.Thread.State: RUNNABLE
	at com.sun.jna.Native.invokePointer(Native Method)
	at com.sun.jna.Function.invokePointer(Function.java:497)
	at com.sun.jna.Function.invoke(Function.java:441)
	at com.sun.jna.Function.invoke(Function.java:361)
	at com.sun.jna.Library$Handler.invoke(Library.java:265)
	at jdk.proxy2.$Proxy17.virConnectGetHostname(jdk.proxy2/Unknown Source)
	at org.libvirt.Connect.getHostName(Unknown Source)
	at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.getHostVmStateReport(LibvirtComputingResource.java:4050)
	at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.getHostVmStateReport(LibvirtComputingResource.java:4005)
	at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.getCurrentStatus(LibvirtComputingResource.java:3703)
	at com.cloud.agent.Agent.processOtherTask(Agent.java:984)
	at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1234)
	at com.cloud.utils.nio.Task.call(Task.java:83)
	at com.cloud.utils.nio.Task.call(Task.java:29)
	at java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:264)
	at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
	at java.lang.Thread.run([email protected]/Thread.java:840)

Could it be something related to libvirt 10.0.0? (Because I've a running env for many days with libvirt 8.0 and I don't see this there. Though it is also OL8 host). Maybe if others experiencing this issue can share the libvirt version to get some idea

Nov 24 '25 09:11 shwstppr