cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

Delay of connection to vm console

Open SviridoffA opened this issue 8 months ago • 32 comments

problem

On the first launch, the connection is established immediately. But after some time delay starting to grow. With a cluster running for 7 days, connection delay can reach approximately 4 seconds, and continues to increase over time. But after restarting Cloudstack agent on the KVM host where the virtual machine we’re connecting to is located, the delay disappears and the connection becomes instant again. The problem returns over time.

versions

ACS - 4.19.2 (cloudstack agent with the same version), issue was also observed in version 4.19.1 Hypervisor - KVM Storage - Linstor Network - Bridges (vlan isolation)

The steps to reproduce the bug

Just let the cluster run for 7-14 days, then try checking the delay. Restarting the agent will temporarily resolve the issue.

What to do about it?

It would be great if this issue could be resolved, as we’d like to avoid restarting the agents via cron.

SviridoffA avatar Apr 02 '25 11:04 SviridoffA

@SviridoffA does the agent mean the cloudstack-agent service on kvm host? or the cloud service in CPVM ?

weizhouapache avatar Apr 02 '25 12:04 weizhouapache

Hi @weizhouapache , I mean restarting cloudstack-agent service on kvm host.

SviridoffA avatar Apr 02 '25 12:04 SviridoffA

Some additional information. All infrastracture build on ubuntu 24.04. Configuration of kvm host network described on picture below.

Management ip configured on cloudbr0, all other traffic types goes directly on bond0. Agent creates guest network bridges directly on bond0, and same for public network. And as a result, bridge cloudbr1 is not used now. I tried to set public.network.device and guest.network.device to cloudbr1, But cloudstack-agent ignoring it. But it doesn't matter, everything works and it suits me.

I just added this here in order to demonstrate my configuration and exclude possible mistakes in it.

SviridoffA avatar Apr 02 '25 18:04 SviridoffA

@SviridoffA just to confirm, does 4 seconds mean the wait time when you open the vm console, or the delay of mouse or keyboard events in the vm console ?

weizhouapache avatar Apr 15 '25 13:04 weizhouapache

@weizhouapache It's delay from click on console icon to open browser tab with console. Over the past month, I’ve been recording delays of up to 8 seconds. I restarted the agent on all nodes on the day this report was created. GitHub shows that two weeks have passed, and this time there is an increase in delay, but it’s less than usual. Currently, this delay is about 2 seconds. This might be related to the fact that I haven’t been very active in the UI over the past few weeks. Yesterday, I restarted agents on 2 of my 4 nodes, and for this 2 nodes browser tab opens instantly, while on the ones where I didn’t restart, it takes about two seconds.

SviridoffA avatar Apr 15 '25 13:04 SviridoffA

@weizhouapache This might not be a problem if some kind of loader was present instead of the icon on click. Currently, some users don’t understand that browser is waiting for a response. They end up clicking multiple times. When the browser finally receives the response, user gets several tabs open simultaneously, some of which are non-functional, and only the last-requested tab works. If needed, I can wait for a more significant delay and record a short video demonstrating how the console opens and how restarting of agents affects this behavior.

SviridoffA avatar Apr 15 '25 13:04 SviridoffA

@weizhouapache It's delay from click on console icon to open browser tab with console. Over the past month, I’ve been recording delays of up to 8 seconds. I restarted the agent on all nodes on the day this report was created. GitHub shows that two weeks have passed, and this time there is an increase in delay, but it’s less than usual. Currently, this delay is about 2 seconds. This might be related to the fact that I haven’t been very active in the UI over the past few weeks. Yesterday, I restarted agents on 2 of my 4 nodes, and for this 2 nodes browser tab opens instantly, while on the ones where I didn’t restart, it takes about two seconds.

@SviridoffA can you open a vm console, and search the agent.log by keyword "GetVncPort" ?

If the agent.log does not contain DEBUG info, you need to enable it

weizhouapache avatar Apr 15 '25 13:04 weizhouapache

@SviridoffA can you open a vm console, and search the agent.log by keyword "GetVncPort" ?

If the agent.log does not contain DEBUG info, you need to enable it

@weizhouapache I connected to the VM console, but unfortunately I couldn’t find any records for that keyword in /var/log/cloud.log. The default logging level is likely INFO. Could you please advise me on how to properly enable DEBUG logging? Unfortunately, I can’t find where this is configured. Is it in /usr/local/cloud/systemvm/conf/agent.properties or somewhere else?

Currently, in the log file, I see that Java exceptions are occurring periodically in recent days. They look like this:

2025-04-14 08:36:40,728 ERROR [vnc.network.NioSocket] (Thread-14042:null) Error reading from socket channel: newPosition < 0: (-1 < 0)
java.lang.IllegalArgumentException: newPosition < 0: (-1 < 0)
        at java.base/java.nio.Buffer.createPositionException(Buffer.java:318)
        at java.base/java.nio.Buffer.position(Buffer.java:293)
        at java.base/java.nio.ByteBuffer.position(ByteBuffer.java:1094)
        at com.cloud.consoleproxy.vnc.network.NioSocket.readFromSocketChannel(NioSocket.java:105)
        at com.cloud.consoleproxy.vnc.network.NioSocketInputStream.readBytesToBuffer(NioSocketInputStream.java:63)
        at com.cloud.consoleproxy.vnc.network.NioSocketInputStream.rearrangeBufferToFitSize(NioSocketInputStream.java:81)
        at com.cloud.consoleproxy.vnc.network.NioSocketInputStream.getReadBytesAvailableToFitSize(NioSocketInputStream.java:36)
        at com.cloud.consoleproxy.vnc.network.NioSocketSSLEngineManager.read(NioSocketSSLEngineManager.java:145)
        at com.cloud.consoleproxy.vnc.network.NioSocketTLSInputStream.readFromSSLEngineManager(NioSocketTLSInputStream.java:38)
        at com.cloud.consoleproxy.vnc.network.NioSocketTLSInputStream.rearrangeBufferToFitSize(NioSocketTLSInputStream.java:63)
        at com.cloud.consoleproxy.vnc.network.NioSocketInputStream.getReadBytesAvailableToFitSize(NioSocketInputStream.java:36)
        at com.cloud.consoleproxy.vnc.network.NioSocketInputStream.readAndCopyUnsignedInteger(NioSocketInputStream.java:95)
        at com.cloud.consoleproxy.vnc.network.NioSocketInputStream.readServerInit(NioSocketInputStream.java:134)
        at com.cloud.consoleproxy.vnc.network.NioSocketHandlerImpl.readServerInit(NioSocketHandlerImpl.java:94)
        at com.cloud.consoleproxy.vnc.NoVncClient.readServerInit(NoVncClient.java:497)
        at com.cloud.consoleproxy.ConsoleProxyNoVncClient.initialisationPhase(ConsoleProxyNoVncClient.java:247)
        at com.cloud.consoleproxy.ConsoleProxyNoVncClient.authenticateVNCServerThroughNioSocket(ConsoleProxyNoVncClient.java:233)
        at com.cloud.consoleproxy.ConsoleProxyNoVncClient.authenticateToVNCServer(ConsoleProxyNoVncClient.java:189)
        at com.cloud.consoleproxy.ConsoleProxyNoVncClient$1.run(ConsoleProxyNoVncClient.java:108)
        at java.base/java.lang.Thread.run(Thread.java:829

SviridoffA avatar Apr 15 '25 15:04 SviridoffA

@SviridoffA can you open a vm console, and search the agent.log by keyword "GetVncPort" ? If the agent.log does not contain DEBUG info, you need to enable it

@weizhouapache I connected to the VM console, but unfortunately I couldn’t find any records for that keyword in /var/log/cloud.log. The default logging level is likely INFO. Could you please advise me on how to properly enable DEBUG logging? Unfortunately, I can’t find where this is configured. Is it in /usr/local/cloud/systemvm/conf/agent.properties or somewhere else?

can you search the agent.log on the kvm host where the vm is running on ?

weizhouapache avatar Apr 15 '25 15:04 weizhouapache

can you search the agent.log on the kvm host where the vm is running on ?

I opened KVM node and found the virtual machine that’s running on it. I connected to console of this vm several times. Then I went to /var/log/cloudstack/agent/agent.log and didn’t find a single line that contained word “GetVncPort.” After that, I went to /etc/cloudstack/agent/log4j-cloud.xml and changed the following values:

<!-- ======================= -->
   <!-- Setup the Root category -->
   <!-- ======================= -->

   <root>
      <level value="DEBUG"/>
      <appender-ref ref="CONSOLE"/>
      <appender-ref ref="FILE"/>
   </root>
<!-- ================ -->
   <!-- Limit categories -->
   <!-- ================ -->

   <category name="com.cloud">
     <priority value="DEBUG"/>
   </category>
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/" debug="true">

After that, I restarted cloudstack-agent. I connected to the VM console several more times and still didn’t find “GetVncPort”. Also, I don’t see any messages with the DEBUG level. Only INFO and WARN. I ran tail -f and I don’t see agent.log reacting to my console connections.

SviridoffA avatar Apr 15 '25 16:04 SviridoffA

After that, I restarted cloudstack-agent. I connected to the VM console several more times and still didn’t find “GetVncPort”. Also, I don’t see any messages with the DEBUG level. Only INFO and WARN. I ran tail -f and I don’t see agent.log reacting to my console connections.

never mind @SviridoffA

can you search on the management server ?

egrep 'GetVnc|ConsoleAccessManagerImpl' /var/log/cloudstack/management/management-server.log

please share the logs for the last vm

weizhouapache avatar Apr 15 '25 17:04 weizhouapache

please share the logs for the last vm

Output from last several connections:

2025-04-15 20:49:06,524 DEBUG [c.c.a.m.ClusteredAgentAttache] (qtp510109769-17683:ctx-818fd8c0 ctx-92243304) (logid:46c296c1) Seq 3-2115284449980614496: Forwarding Seq 3-2115284449980614496:  { Cmd , MgmtId: 345049528973, via: 3(node3), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.GetVncPortCommand":{"id":"18","name":"i-2-18-VM","wait":"0","bypassHostMaintenance":"false"}}] } to 345049509737
2025-04-15 20:49:07,351 DEBUG [c.c.a.t.Request] (AgentManager-Handler-11:null) (logid:) Seq 3-2115284449980614496: Processing:  { Ans: , MgmtId: 345049528973, via: 3, Ver: v1, Flags: 10, [{"com.cloud.agent.api.GetVncPortAnswer":{"address":"10.66.50.53","port":"5900","result":"true","wait":"0","bypassHostMaintenance":"false"}}] }
2025-04-15 20:49:07,352 DEBUG [c.c.a.t.Request] (qtp510109769-17683:ctx-818fd8c0 ctx-92243304) (logid:46c296c1) Seq 3-2115284449980614496: Received:  { Ans: , MgmtId: 345049528973, via: 3(node3), Ver: v1, Flags: 10, { GetVncPortAnswer } }
2025-04-15 20:49:07,352 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17683:ctx-818fd8c0 ctx-92243304) (logid:46c296c1) Port info 10.66.50.53
2025-04-15 20:49:07,352 INFO  [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17683:ctx-818fd8c0 ctx-92243304) (logid:46c296c1) Parse host info returned from executing GetVNCPortCommand. host info: 10.66.50.53
2025-04-15 20:49:07,360 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17683:ctx-818fd8c0 ctx-92243304) (logid:46c296c1) Compose console url: https://<HidenExternalAddress>/resource/noVNC/vnc.html?autoconnect=true&port=8443&token=R2a0f9b1EP6b4TZkNkTw8Q+ZXQyqbEGcWEW63a/JP+tKqhJMBSJUy462gNbUFLPDDAhjdC607IXKGlZAn7XFPK9UxsaDaSiGwGicw9MBT+BfnQ7DcGq/cAoH2udr6xRs7HYmeVbiCMsCDbNOszuKQHiXU4vR0yujFcRAEisiVAY5VPXlEPtXS7jnUv6IwVOuMrO/tikLHgxMTxMgUAauoon8OdGe9QlPxD7Q2c3142ybwpJJsixqvDzmF9DErJTi5Jds2V/yg2cOUxmW6hzfTA7A0o74RGfmNRsKgp6rW/Z23T3sVnbjDMwey9tF0AwoKh4qbGMSlPg1JROFdKqvuUb3Fr1tr3qdzCDLKQX/AL9H8f9AS6IJNVagcQN4He5OwL7FRo0NffxFTJIL/HTnGie0foLngnQIX6imFpAN6mxy4yIkIyX3fWs6x2agCdBLKqZYP2uMhtCTkNwS+M28Qz041qGJwtKkNTf+mJLUBWkgsHJF/oUZ71TDETwZBqFk0zJlNLm+NivFiEYggANjJbRG
2025-04-15 20:49:07,360 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17683:ctx-818fd8c0 ctx-92243304) (logid:46c296c1) Adding allowed session: 8d1c7638-c5c1-404f-a5ee-196b802b9d0d
2025-04-15 20:49:07,442 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17683:ctx-818fd8c0 ctx-92243304) (logid:46c296c1) The console URL is: https://<HidenExternalAddress>/resource/noVNC/vnc.html?autoconnect=true&port=8443&token=R2a0f9b1EP6b4TZkNkTw8Q+ZXQyqbEGcWEW63a/JP+tKqhJMBSJUy462gNbUFLPDDAhjdC607IXKGlZAn7XFPK9UxsaDaSiGwGicw9MBT+BfnQ7DcGq/cAoH2udr6xRs7HYmeVbiCMsCDbNOszuKQHiXU4vR0yujFcRAEisiVAY5VPXlEPtXS7jnUv6IwVOuMrO/tikLHgxMTxMgUAauoon8OdGe9QlPxD7Q2c3142ybwpJJsixqvDzmF9DErJTi5Jds2V/yg2cOUxmW6hzfTA7A0o74RGfmNRsKgp6rW/Z23T3sVnbjDMwey9tF0AwoKh4qbGMSlPg1JROFdKqvuUb3Fr1tr3qdzCDLKQX/AL9H8f9AS6IJNVagcQN4He5OwL7FRo0NffxFTJIL/HTnGie0foLngnQIX6imFpAN6mxy4yIkIyX3fWs6x2agCdBLKqZYP2uMhtCTkNwS+M28Qz041qGJwtKkNTf+mJLUBWkgsHJF/oUZ71TDETwZBqFk0zJlNLm+NivFiEYggANjJbRG
2025-04-15 20:49:27,341 DEBUG [c.c.a.m.ClusteredAgentAttache] (qtp510109769-17683:ctx-70c5f54f ctx-c4a94529) (logid:5d6271ec) Seq 3-2115284449980614497: Forwarding Seq 3-2115284449980614497:  { Cmd , MgmtId: 345049528973, via: 3(node3), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.GetVncPortCommand":{"id":"19","name":"i-2-19-VM","wait":"0","bypassHostMaintenance":"false"}}] } to 345049509737
2025-04-15 20:49:28,175 DEBUG [c.c.a.t.Request] (AgentManager-Handler-14:null) (logid:) Seq 3-2115284449980614497: Processing:  { Ans: , MgmtId: 345049528973, via: 3, Ver: v1, Flags: 10, [{"com.cloud.agent.api.GetVncPortAnswer":{"address":"10.66.50.53","port":"5901","result":"true","wait":"0","bypassHostMaintenance":"false"}}] }
2025-04-15 20:49:28,176 DEBUG [c.c.a.t.Request] (qtp510109769-17683:ctx-70c5f54f ctx-c4a94529) (logid:5d6271ec) Seq 3-2115284449980614497: Received:  { Ans: , MgmtId: 345049528973, via: 3(node3), Ver: v1, Flags: 10, { GetVncPortAnswer } }
2025-04-15 20:49:28,176 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17683:ctx-70c5f54f ctx-c4a94529) (logid:5d6271ec) Port info 10.66.50.53
2025-04-15 20:49:28,176 INFO  [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17683:ctx-70c5f54f ctx-c4a94529) (logid:5d6271ec) Parse host info returned from executing GetVNCPortCommand. host info: 10.66.50.53
2025-04-15 20:49:28,183 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17683:ctx-70c5f54f ctx-c4a94529) (logid:5d6271ec) Compose console url: https://<HidenExternalAddress>/resource/noVNC/vnc.html?autoconnect=true&port=8443&token=9rcsEUfmOXY7LTY2EKRhmemKEdwokYbklyAhhzjHJggejPHOWf/Bs35KSl6AYE10B1FFgUeGp+C5w/je0kLo490L0vOnSvv7UAs6PXuW8dOthWZIB4WX7v5r49NYaufJWyKhuFm7YOvwyPGHnjpIfUzr1XRIIOMa0Nm0Mk1lfAwe9SlrerHHyivzf71xSii+ox4ZfC1+kx9NLkj//DOCHmUzMBakvt4YU9/RI0jJSqZLXky90j6g7YvHbBqOH4lwy+iEdwoFO+Lmu3MuRC7/J277L+uAeUKrw8jfymsosB8M8TmjE7kUOR2SE36OHG96gLU1M1X5F3M2JcDjpv4zYqiVqIgGyPup5VLuIHfDENTxcdV4drt9jM5iWux0WkBPlyN0COJHp40gWqt4QzUTArPK4kBjIdSr+YrQdHfhEFAAX4eYfDoeJDEqcdpvtuRQ3kJa0SKZMdaiQcI1Pz0YWhmVVSjCM3dlc2MJWQGze1GdteDEWJfgcK6GH2Gqc9GSJFMkzKa9Fv38tQ/q7FHRASs8
2025-04-15 20:49:28,183 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17683:ctx-70c5f54f ctx-c4a94529) (logid:5d6271ec) Adding allowed session: 866aea1b-ab7d-46f8-8110-521bf5f86e56
2025-04-15 20:49:28,273 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17683:ctx-70c5f54f ctx-c4a94529) (logid:5d6271ec) The console URL is: https://<HidenExternalAddress>/resource/noVNC/vnc.html?autoconnect=true&port=8443&token=9rcsEUfmOXY7LTY2EKRhmemKEdwokYbklyAhhzjHJggejPHOWf/Bs35KSl6AYE10B1FFgUeGp+C5w/je0kLo490L0vOnSvv7UAs6PXuW8dOthWZIB4WX7v5r49NYaufJWyKhuFm7YOvwyPGHnjpIfUzr1XRIIOMa0Nm0Mk1lfAwe9SlrerHHyivzf71xSii+ox4ZfC1+kx9NLkj//DOCHmUzMBakvt4YU9/RI0jJSqZLXky90j6g7YvHbBqOH4lwy+iEdwoFO+Lmu3MuRC7/J277L+uAeUKrw8jfymsosB8M8TmjE7kUOR2SE36OHG96gLU1M1X5F3M2JcDjpv4zYqiVqIgGyPup5VLuIHfDENTxcdV4drt9jM5iWux0WkBPlyN0COJHp40gWqt4QzUTArPK4kBjIdSr+YrQdHfhEFAAX4eYfDoeJDEqcdpvtuRQ3kJa0SKZMdaiQcI1Pz0YWhmVVSjCM3dlc2MJWQGze1GdteDEWJfgcK6GH2Gqc9GSJFMkzKa9Fv38tQ/q7FHRASs8
2025-04-15 20:49:39,427 DEBUG [c.c.a.t.Request] (qtp510109769-17657:ctx-df50577c ctx-4569ebff) (logid:12987c31) Seq 4-1755277954767685440: Sending  { Cmd , MgmtId: 345049528973, via: 4(node4), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.GetVncPortCommand":{"id":"20","name":"i-2-20-VM","wait":"0","bypassHostMaintenance":"false"}}] }
2025-04-15 20:49:40,766 DEBUG [c.c.a.t.Request] (AgentManager-Handler-3:null) (logid:) Seq 4-1755277954767685440: Processing:  { Ans: , MgmtId: 345049528973, via: 4, Ver: v1, Flags: 10, [{"com.cloud.agent.api.GetVncPortAnswer":{"address":"10.66.50.54","port":"5901","result":"true","wait":"0","bypassHostMaintenance":"false"}}] }
2025-04-15 20:49:40,766 DEBUG [c.c.a.t.Request] (qtp510109769-17657:ctx-df50577c ctx-4569ebff) (logid:12987c31) Seq 4-1755277954767685440: Received:  { Ans: , MgmtId: 345049528973, via: 4(node4), Ver: v1, Flags: 10, { GetVncPortAnswer } }
2025-04-15 20:49:40,766 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17657:ctx-df50577c ctx-4569ebff) (logid:12987c31) Port info 10.66.50.54
2025-04-15 20:49:40,766 INFO  [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17657:ctx-df50577c ctx-4569ebff) (logid:12987c31) Parse host info returned from executing GetVNCPortCommand. host info: 10.66.50.54
2025-04-15 20:49:40,772 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17657:ctx-df50577c ctx-4569ebff) (logid:12987c31) Compose console url: https://<HidenExternalAddress>/resource/noVNC/vnc.html?autoconnect=true&port=8443&token=1h1aNbF1mhl4d4Zt3zu4fiWQeFm8PKrH4ngY1FUkxpT8lDcs7mZGunDO55JqhFU0O3BpVcQ501Rn64Q96ymb5qz4T0Se+NZlt24LpWrG4iO6lLoytHm9Vf4PnjDLUmf47rUFtZ+ijnksYpRHCl6VQhRYubEHKAHKjXEXLRi+SWyD06VOWlQrwlmKfu7RI2GsHoy9ovrTl8Usg0dcdL2As2xcPhQhgkgdXG/rQ/KElnZuZIowLgpQyY6VcH3YJL53kg7Y4jXiQcUlOSAgGW2tVDj6YMBF232eq9t6wD2/u+lZ3Y4wJG+6hLTpmfCU4BP04OWVB6to+oRYXDrJeCEszfm32HvGnY/rKB/6eZvTh3j4WwVYSPYL5t6wlsJochA5krAG02AguS60Os9Uvj+jHJxHIx8yWlhxDbIKAowsrBLuq/DqyHwRb+92IrhreYyIEQVTnDLDKJN2dWIY29eDp/lH8RytdVmG5Vsm8ejfZWbgmaskTXI+TjnutqedQeGq8hVb1O7NLvdXBljRtDCD2Piw
2025-04-15 20:49:40,772 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17657:ctx-df50577c ctx-4569ebff) (logid:12987c31) Adding allowed session: bfc95bb0-0b0a-4db3-b6d9-d2478bf9eed1
2025-04-15 20:49:40,856 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17657:ctx-df50577c ctx-4569ebff) (logid:12987c31) The console URL is: https://<HidenExternalAddress>/resource/noVNC/vnc.html?autoconnect=true&port=8443&token=1h1aNbF1mhl4d4Zt3zu4fiWQeFm8PKrH4ngY1FUkxpT8lDcs7mZGunDO55JqhFU0O3BpVcQ501Rn64Q96ymb5qz4T0Se+NZlt24LpWrG4iO6lLoytHm9Vf4PnjDLUmf47rUFtZ+ijnksYpRHCl6VQhRYubEHKAHKjXEXLRi+SWyD06VOWlQrwlmKfu7RI2GsHoy9ovrTl8Usg0dcdL2As2xcPhQhgkgdXG/rQ/KElnZuZIowLgpQyY6VcH3YJL53kg7Y4jXiQcUlOSAgGW2tVDj6YMBF232eq9t6wD2/u+lZ3Y4wJG+6hLTpmfCU4BP04OWVB6to+oRYXDrJeCEszfm32HvGnY/rKB/6eZvTh3j4WwVYSPYL5t6wlsJochA5krAG02AguS60Os9Uvj+jHJxHIx8yWlhxDbIKAowsrBLuq/DqyHwRb+92IrhreYyIEQVTnDLDKJN2dWIY29eDp/lH8RytdVmG5Vsm8ejfZWbgmaskTXI+TjnutqedQeGq8hVb1O7NLvdXBljRtDCD2Piw
2025-04-15 20:49:50,552 DEBUG [c.c.a.t.Request] (qtp510109769-17657:ctx-b550d901 ctx-2d9f8a44) (logid:d359886e) Seq 4-1755277954767685443: Sending  { Cmd , MgmtId: 345049528973, via: 4(node4), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.GetVncPortCommand":{"id":"21","name":"i-2-21-VM","wait":"0","bypassHostMaintenance":"false"}}] }
2025-04-15 20:49:51,773 DEBUG [c.c.a.t.Request] (AgentManager-Handler-11:null) (logid:) Seq 4-1755277954767685443: Processing:  { Ans: , MgmtId: 345049528973, via: 4, Ver: v1, Flags: 10, [{"com.cloud.agent.api.GetVncPortAnswer":{"address":"10.66.50.54","port":"5900","result":"true","wait":"0","bypassHostMaintenance":"false"}}] }
2025-04-15 20:49:51,774 DEBUG [c.c.a.t.Request] (qtp510109769-17657:ctx-b550d901 ctx-2d9f8a44) (logid:d359886e) Seq 4-1755277954767685443: Received:  { Ans: , MgmtId: 345049528973, via: 4(node4), Ver: v1, Flags: 10, { GetVncPortAnswer } }
2025-04-15 20:49:51,774 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17657:ctx-b550d901 ctx-2d9f8a44) (logid:d359886e) Port info 10.66.50.54
2025-04-15 20:49:51,774 INFO  [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17657:ctx-b550d901 ctx-2d9f8a44) (logid:d359886e) Parse host info returned from executing GetVNCPortCommand. host info: 10.66.50.54
2025-04-15 20:49:51,779 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17657:ctx-b550d901 ctx-2d9f8a44) (logid:d359886e) Compose console url: https://<HidenExternalAddress>/resource/noVNC/vnc.html?autoconnect=true&port=8443&token=k1wgXX2Bk+iXGfkZsBYvTqjmAtX//E4H18H8uw9G+YjH185imW4YHnKYcQMe6B9tkGfiaAQif2Tj65IHIYpyyXKKuEHbuj/xodROP3uEAyLD2FL8nkzwJQePmfK6tlcNAfG8hYIEGKqCuQYOU4FkZ2xCa3Qq3g1SuKQJJ9yM5gybF3ECFwLFN8y+DRW/CA1tGQknRieB+BwEVSwmmRsLowWM+ez2nYXIRHoRv6knPsCGdtK/21FWJ2UGjnDWY8+9iza/YRIeSOzGs4HAxdafsch6LnLBWjMKZ9zmom1Yn+AddgU7Wkf++hV2mdSRMpep2NgtC+qeAXovywH/wRepmNUExZ/V67rf6GPchcNNPH2YGXhL3XvCk/fDbotxpPvbuznBuqdzxhBLVzsHx26jywyVYjF+VfE7CzYUld3vhJM9tq52eqHFVYEG553oscVRh4FtTIq031qKGUoWhXLIUCxBS9x8v9qBmbcJgLCjvyrjFBAwnEl26yaufAQbk4MpQDGTP2GDs1h8YczncDI0M5Kz
2025-04-15 20:49:51,779 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17657:ctx-b550d901 ctx-2d9f8a44) (logid:d359886e) Adding allowed session: bd09144a-7011-44b9-bb89-bf257d7d89d2
2025-04-15 20:49:51,864 DEBUG [o.a.c.c.ConsoleAccessManagerImpl] (qtp510109769-17657:ctx-b550d901 ctx-2d9f8a44) (logid:d359886e) The console URL is: https://<HidenExternalAddress>/resource/noVNC/vnc.html?autoconnect=true&port=8443&token=k1wgXX2Bk+iXGfkZsBYvTqjmAtX//E4H18H8uw9G+YjH185imW4YHnKYcQMe6B9tkGfiaAQif2Tj65IHIYpyyXKKuEHbuj/xodROP3uEAyLD2FL8nkzwJQePmfK6tlcNAfG8hYIEGKqCuQYOU4FkZ2xCa3Qq3g1SuKQJJ9yM5gybF3ECFwLFN8y+DRW/CA1tGQknRieB+BwEVSwmmRsLowWM+ez2nYXIRHoRv6knPsCGdtK/21FWJ2UGjnDWY8+9iza/YRIeSOzGs4HAxdafsch6LnLBWjMKZ9zmom1Yn+AddgU7Wkf++hV2mdSRMpep2NgtC+qeAXovywH/wRepmNUExZ/V67rf6GPchcNNPH2YGXhL3XvCk/fDbotxpPvbuznBuqdzxhBLVzsHx26jywyVYjF+VfE7CzYUld3vhJM9tq52eqHFVYEG553oscVRh4FtTIq031qKGUoWhXLIUCxBS9x8v9qBmbcJgLCjvyrjFBAwnEl26yaufAQbk4MpQDGTP2GDs1h8YczncDI0M5Kz

SviridoffA avatar Apr 15 '25 17:04 SviridoffA

Calculated the timestamp of GetVncPortCommand and GetVncPortAnswer,

  • for the first two vms (on node03), the duration is less than 1 second
  • for the last two vms (on node04), the duration is 1.2-1.3 seconds, a bit longer.

Does node04 have more running vms, higher cpu usage, or less free memory ?

weizhouapache avatar Apr 15 '25 18:04 weizhouapache

Cluster mostly on idle.. DRS turned on. All nodes in one same network switch.

Image

SviridoffA avatar Apr 15 '25 19:04 SviridoffA

Cluster mostly on idle.. DRS turned on. All nodes in one same network switch.

that's a bit weird

actually the process of GetVncPortCommand on the kvm host is pretty simple

https://github.com/apache/cloudstack/blob/main/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtGetVncPortCommandWrapper.java

can you update log level and retry ?

cp /etc/cloudstack/agent/log4j-cloud.xml /etc/cloudstack/agent/log4j-cloud.xml.orig.info
sed -i "s/INFO/DEBUG/g" /etc/cloudstack/agent/log4j-cloud.xml

weizhouapache avatar Apr 16 '25 06:04 weizhouapache

I executed sed -i "s/INFO/DEBUG/g" /etc/cloudstack/agent/log4j-cloud.xml and see DEBUG messages even without restart agent.

here a fragment of console connection (node4)

2025-04-16 13:29:46,730 DEBUG [cloud.agent.Agent] (agentRequest-Handler-4:null) (logid:8aa1682b) Processing command: com.cloud.agent.api.GetHostStatsCommand
2025-04-16 13:30:02,157 DEBUG [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:0321b531) Processing command: com.cloud.agent.api.GetHostStatsCommand
2025-04-16 13:30:03,301 DEBUG [cloud.agent.Agent] (agentRequest-Handler-3:null) (logid:ad81f47e) Request:Seq 4-1755277954767687211:  { Cmd , MgmtId: 345049528973, via: 4, Ver: v1, Flags: 10001
1, [{"com.cloud.agent.api.GetVncPortCommand":{"id":"21","name":"i-2-21-VM","wait":"0","bypassHostMaintenance":"false"}}] }
2025-04-16 13:30:03,301 DEBUG [cloud.agent.Agent] (agentRequest-Handler-3:null) (logid:ad81f47e) Processing command: com.cloud.agent.api.GetVncPortCommand
2025-04-16 13:30:03,302 DEBUG [agent.properties.AgentPropertiesFileHandler] (agentRequest-Handler-3:null) (logid:ad81f47e) Property [hypervisor.uri] has empty or null value. Using default valu
e [null].
2025-04-16 13:30:03,302 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-3:null) (logid:ad81f47e) Looking for libvirtd connection at: qemu:///system
2025-04-16 13:30:04,184 DEBUG [kvm.resource.LibvirtVMDef] (agentRequest-Handler-3:null) (logid:ad81f47e) Using informed label [hdc] for volume [null].
2025-04-16 13:30:04,185 DEBUG [cloud.agent.Agent] (agentRequest-Handler-3:null) (logid:ad81f47e) Seq 4-1755277954767687211:  { Ans: , MgmtId: 345049528973, via: 4, Ver: v1, Flags: 10, [{"com.c
loud.agent.api.GetVncPortAnswer":{"address":"10.66.50.54","port":"5900","result":"true","wait":"0","bypassHostMaintenance":"false"}}] }
2025-04-16 13:30:23,415 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:0d0fceb3) Processing command: com.cloud.agent.api.GetVolumeStatsCommand
2025-04-16 13:30:23,415 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-5:null) (logid:0d0fceb3) Looking for libvirtd connection at: qemu:///system
2025-04-16 13:30:23,569 DEBUG [kvm.storage.LinstorStoragePool] (Thread-1:null) (logid:) Executing command [drbdsetup events2 --now ].
2025-04-16 13:30:23,576 DEBUG [kvm.storage.LinstorStoragePool] (Thread-1:null) (logid:) Successfully executed process [2444512] for command [drbdsetup events2 --now ].
2025-04-16 13:30:23,638 DEBUG [kvm.storage.LinstorStorageAdaptor] (agentRequest-Handler-5:null) (logid:0d0fceb3) Linstor getStoragePool: 90341db9-2f21-4226-bb72-5807b71db3e6 -> false
2025-04-16 13:30:23,639 DEBUG [kvm.storage.LinstorStorageAdaptor] (agentRequest-Handler-5:null) (logid:0d0fceb3) Linstor: getPhysicalDisk for 5faf5d81-df40-4999-8365-09a6ada27c7b
2025-04-16 13:30:23,645 DEBUG [datastore.util.LinstorUtil] (agentRequest-Handler-5:null) (logid:0d0fceb3) getDevicePath: cs-5faf5d81-df40-4999-8365-09a6ada27c7b -> /dev/drbd1018

SviridoffA avatar Apr 16 '25 10:04 SviridoffA

2025-04-16 13:30:03,301 DEBUG [cloud.agent.Agent] (agentRequest-Handler-3:null) (logid:ad81f47e) Request:Seq 4-1755277954767687211: { Cmd , MgmtId: 345049528973, via: 4, Ver: v1, Flags: 10001 1, [{"com.cloud.agent.api.GetVncPortCommand":{"id":"21","name":"i-2-21-VM","wait":"0","bypassHostMaintenance":"false"}}] } 2025-04-16 13:30:03,301 DEBUG [cloud.agent.Agent] (agentRequest-Handler-3:null) (logid:ad81f47e) Processing command: com.cloud.agent.api.GetVncPortCommand

2025-04-16 13:30:04,185 DEBUG [cloud.agent.Agent] (agentRequest-Handler-3:null) (logid:ad81f47e) Seq 4-1755277954767687211: { Ans: , MgmtId: 345049528973, via: 4, Ver: v1, Flags: 10, [{"com.c loud.agent.api.GetVncPortAnswer":{"address":"10.66.50.54","port":"5900","result":"true","wait":"0","bypassHostMaintenance":"false"}}] }

on my testing env, it is much faster

2025-04-16T10:50:37,975 DEBUG [cloud.agent.Agent] (AgentRequest-Handler-3:[]) (logid:) Processing command: com.cloud.agent.api.GetVncPortCommand

2025-04-16T10:50:37,983 DEBUG [cloud.agent.Agent] (AgentRequest-Handler-3:[]) (logid:) Seq 27-3151112364275793966: { Ans: , MgmtId: 32988637168683, via: 27, Ver: v1, Flags: 10, [{"com.cloud.agent.api.GetVncPortAnswer":{"address":"10.0.35.82","port":"5900","result":"true","wait":"0","bypassHostMaintenance":"false"}}] }

weizhouapache avatar Apr 16 '25 11:04 weizhouapache

On my node1, where I restart cloudstack agent about 2 days ago, its much faster too

2025-04-16 14:13:01,792 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:59cca398) Request:Seq 1-41939771529890292: { Cmd , MgmtId: 345049528973, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.GetVncPortCommand":{"id":"15","name":"i-2-15-VM","wait":"0","bypassHostMaintenance":"false"}}] } 2025-04-16 14:13:01,792 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:59cca398) Processing command: com.cloud.agent.api.GetVncPortCommand 2025-04-16 14:13:01,793 DEBUG [agent.properties.AgentPropertiesFileHandler] (agentRequest-Handler-2:null) (logid:59cca398) Property [hypervisor.uri] has empty or null value. Using default valu e [null]. 2025-04-16 14:13:01,793 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) (logid:59cca398) Looking for libvirtd connection at: qemu:///system 2025-04-16 14:13:01,795 DEBUG [kvm.resource.LibvirtVMDef] (agentRequest-Handler-2:null) (logid:59cca398) Using informed label [hdc] for volume [null]. 2025-04-16 14:13:01,796 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:59cca398) Seq 1-41939771529890292: { Ans: , MgmtId: 345049528973, via: 1, Ver: v1, Flags: 10, [{"com.clo ud.agent.api.GetVncPortAnswer":{"address":"10.66.50.51","port":"5901","result":"true","wait":"0","bypassHostMaintenance":"false"}}] }

but was the same). To be honest, about 2 seconds delay it's not a problem. Problem it's 4 and 8 seconds. I’d like to suggest we pause on this issue for a month, or at least until delay increases again to 4 seconds or more. Perhaps we’ll see something new in the logs. Or we could close this issue if the delay stops increasing.

SviridoffA avatar Apr 16 '25 11:04 SviridoffA

just in case... my agent prooerties

Image

SviridoffA avatar Apr 16 '25 11:04 SviridoffA

node4 console connection after restarting cloudstack agent

2025-04-16 15:17:02,932 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:38a02db1) Request:Seq 4-2075033528310956043:  { Cmd , MgmtId: 345049528973, via: 4, Ver: v1, Flags: 10001
1, [{"com.cloud.agent.api.GetVncPortCommand":{"id":"21","name":"i-2-21-VM","wait":"0","bypassHostMaintenance":"false"}}] }
2025-04-16 15:17:02,933 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:38a02db1) Processing command: com.cloud.agent.api.GetVncPortCommand
2025-04-16 15:17:02,933 DEBUG [agent.properties.AgentPropertiesFileHandler] (agentRequest-Handler-5:null) (logid:38a02db1) Property [hypervisor.uri] has empty or null value. Using default valu
e [null].
2025-04-16 15:17:02,933 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-5:null) (logid:38a02db1) Looking for libvirtd connection at: qemu:///system
2025-04-16 15:17:02,938 DEBUG [kvm.resource.LibvirtVMDef] (agentRequest-Handler-5:null) (logid:38a02db1) Using informed label [hdc] for volume [null].
2025-04-16 15:17:02,940 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:38a02db1) Seq 4-2075033528310956043:  { Ans: , MgmtId: 345049528973, via: 4, Ver: v1, Flags: 10, [{"com.c
loud.agent.api.GetVncPortAnswer":{"address":"10.66.50.54","port":"5900","result":"true","wait":"0","bypassHostMaintenance":"false"}}] }

much faster and in browser opens immediately

SviridoffA avatar Apr 16 '25 12:04 SviridoffA

node4 console connection after restarting cloudstack agent

2025-04-16 15:17:02,932 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:38a02db1) Request:Seq 4-2075033528310956043:  { Cmd , MgmtId: 345049528973, via: 4, Ver: v1, Flags: 10001
1, [{"com.cloud.agent.api.GetVncPortCommand":{"id":"21","name":"i-2-21-VM","wait":"0","bypassHostMaintenance":"false"}}] }
2025-04-16 15:17:02,933 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:38a02db1) Processing command: com.cloud.agent.api.GetVncPortCommand
2025-04-16 15:17:02,933 DEBUG [agent.properties.AgentPropertiesFileHandler] (agentRequest-Handler-5:null) (logid:38a02db1) Property [hypervisor.uri] has empty or null value. Using default valu
e [null].
2025-04-16 15:17:02,933 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-5:null) (logid:38a02db1) Looking for libvirtd connection at: qemu:///system
2025-04-16 15:17:02,938 DEBUG [kvm.resource.LibvirtVMDef] (agentRequest-Handler-5:null) (logid:38a02db1) Using informed label [hdc] for volume [null].
2025-04-16 15:17:02,940 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:38a02db1) Seq 4-2075033528310956043:  { Ans: , MgmtId: 345049528973, via: 4, Ver: v1, Flags: 10, [{"com.c
loud.agent.api.GetVncPortAnswer":{"address":"10.66.50.54","port":"5900","result":"true","wait":"0","bypassHostMaintenance":"false"}}] }

much faster and in browser opens immediately

this looks good

it looks like this step caused the issue

LibvirtConnection.getConnectionByVmName(vmName);

I wonder if it is because of different storages (we use NFS)

weizhouapache avatar Apr 16 '25 12:04 weizhouapache

I wonder if it is because of different storages (we use NFS)

I'm using Linstor 1.30.4

SviridoffA avatar Apr 16 '25 13:04 SviridoffA

Hi Everyone! I Just wanted to give an update on the current situation. I waited a month to grow up some delay. Currently, systemctl status cloudstack-agent on all my nodes shows “active since 1 month 3 days.”

Current delay on my nodes:

node1 - 5 seconds node2 - 6 seconds node3 - no delay node4 - 2 seconds

I enabled Debug on the node2, and it shows the following…

Image

If anyone has any ideas what else I can check, I won’t restart the agent service until weekend, after this I’ll restart it, because current delay is already critical for us.

SviridoffA avatar May 20 '25 13:05 SviridoffA

Hi @rp- ! Have you experienced a similar problem? Maybe you could suggest what I should check? Or what I might have done wrong

SviridoffA avatar May 20 '25 14:05 SviridoffA

First time I hear about something like this, in our internal cluster (which was updated/restarted 10 days ago) vm console is basically instant.

How is it if you try to get a console with virsh directly on a host?

rp- avatar May 20 '25 14:05 rp-

@rp- Thanks for you reply. What do you mean by saying ‘console with virsh directly on the host’? If you’re referring to virsh console command, then connection works without any issues. If you mean connecting with a VNC client directly to the host and the port which we can see via virsh vncdisplay, then unfortunately, I don’t currently know how to perform this connection. I’ve tried using various VNC clients and with virt-manager, but without success. It seems to be related to CA certificates. But in any case, according to the logs, the delay isn’t occurring with the connection itself, but with operations of CloudStack between the GetVncPortCommand request and the GetVncPortAnswer. There’s a long delay before we get the port response. Anyway, I’d appreciate it if you could give me some pointers on how to check the console connection with virsh.

SviridoffA avatar May 20 '25 18:05 SviridoffA

@rp- Thanks for you reply. What do you mean by saying ‘console with virsh directly on the host’? If you’re referring to virsh console command, then connection works without any issues. If you mean connecting with a VNC client directly to the host and the port which we can see via virsh vncdisplay, then unfortunately, I don’t currently know how to perform this connection. I’ve tried using various VNC clients and with virt-manager, but without success. It seems to be related to CA certificates. But in any case, according to the logs, the delay isn’t occurring with the connection itself, but with operations of CloudStack between the GetVncPortCommand request and the GetVncPortAnswer. There’s a long delay before we get the port response. Anyway, I’d appreciate it if you could give me some pointers on how to check the console connection with virsh.

if restarting the cloudstack-agent solves the issue, I guess the issue is caused by cloudstack-agent It might be because libvirt connections are not closed need investigation

weizhouapache avatar May 20 '25 18:05 weizhouapache

if restarting the cloudstack-agent solves the issue, I guess the issue is caused by cloudstack-agent It might be because libvirt connections are not closed need investigation

Not sure if its related, but i have noticed some messages about libvirt connections. This messages repeats periodicly.

2025-05-20T21:27:03.780296+03:00 node2 java[2093549]: DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-5:) (logid:bf212355) Looking for libvirtd connection at: qemu:///system
2025-05-20T21:27:09.773421+03:00 node2 java[2093549]: DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:) (logid:4ae93de1) Processing command: com.cloud.agent.api.GetHostStatsCommand
2025-05-20T21:27:09.938961+03:00 node2 java[2093549]: DEBUG [cloud.agent.Agent] (agentRequest-Handler-4:) (logid:f05c61be) Processing command: com.cloud.agent.api.GetVmStatsCommand
2025-05-20T21:27:09.939183+03:00 node2 java[2093549]: DEBUG [agent.properties.AgentPropertiesFileHandler] (agentRequest-Handler-4:) (logid:f05c61be) Property [hypervisor.uri] has empty or null
 value. Using default value [null].

And nothing else suspicious.

SviridoffA avatar May 20 '25 18:05 SviridoffA

@rp- Thanks for you reply. What do you mean by saying ‘console with virsh directly on the host’? If you’re referring to virsh console command, then connection works without any issues. If you mean connecting with a VNC client directly to the host and the port which we can see via virsh vncdisplay, then unfortunately, I don’t currently know how to perform this connection. I’ve tried using various VNC clients and with virt-manager, but without success. It seems to be related to CA certificates. But in any case, according to the logs, the delay isn’t occurring with the connection itself, but with operations of CloudStack between the GetVncPortCommand request and the GetVncPortAnswer. There’s a long delay before we get the port response. Anyway, I’d appreciate it if you could give me some pointers on how to check the console connection with virsh.

I just meant the virsh console command, but as Wei said, I would also suspect something with the agent <-> libvirt connection. But I don't have much experience with that part.

rp- avatar May 21 '25 05:05 rp-

Restarting libvirtd solves this problem too. All I have noticed its about 5 or 8 kvm pools befor restart, and after restart only 1 pool. And looks like nothing else changed. I turn on debug level in libvirtd logs. Let’s see if we can find something.

SviridoffA avatar May 21 '25 13:05 SviridoffA