open-vm-tools icon indicating copy to clipboard operation
open-vm-tools copied to clipboard

High CPU utilization

Open alexvancasper opened this issue 2 years ago • 7 comments

Describe the bug

We observe high CPU utilization of process vmtoolsd in case when the guest system has many of network interfaces. (in our case 3000 local interfaces)

Workaround: vmware-toolbox-cmd config set guestinfo poll-interval 0 But this command will not completely solve the problem with high CPU usage. Anyway every ~30 seconds we obvesrve 100% CPU load.

How to decrease high cpu load? If we disable stats vmware-toolbox-cmd config set guestinfo stats-interval 0 then what is the side effects of this action?

Reproduction steps

  1. Create ~3K interfaces in the guest system. No any traffic on the node.
  2. Check the CPU utilization. Periodically the system will have 100% CPU load.

Expected behavior

Multiple interfaces should not affect CPU usage, especially when there is no traffic on the host.

Additional context

Guest info: Wind River Linux Secure v.8.0.0.32 Version of vmtools is 10.1.5.59732

alexvancasper avatar Apr 11 '23 10:04 alexvancasper

What version of Tools?

If you set the poll-interval to 0, then it shouldn't be looking for NICs at all, so the cause may be something else. About 5 years ago the were issues with IPv6 routes, which are expensive to query in Linux.

Setting stats-interval to 0 will disable stats collection which is used by vSphere APIs and management applications (such as VROps).

lemke1458 avatar Apr 11 '23 20:04 lemke1458

Version of vmtools is 10.1.5.59732 (updated also in the original ticket) How to find out the reason of high CPU?

alexvancasper avatar Apr 11 '23 21:04 alexvancasper

Its possible its the routing issue. That was fixed in 10.3.10 https://github.com/vmware/open-vm-tools/commit/065f09b94e09f1127901db29e73cc9b9f36df4fc

10.1.5 was released about 6 years ago, there have been many changes since then.

lemke1458 avatar Apr 11 '23 23:04 lemke1458

@alexvancasper

With the 3k of interfaces and open-vm-tools 10.1.5, it is probable that you are hitting the routing table issue reported in https://github.com/vmware/open-vm-tools/issues/186

Your description suggests that setting the poll-interval to 0 does not avoid the problem. Would you share the contents of any /etc/vmware-tools/tools.conf on the Wind River Linux VM?

johnwvmw avatar Apr 12 '23 01:04 johnwvmw

In the VM we have only connected routes (3K interfaces) no external routes at all.

The behavior is changed when I applied poll-interval=0 Without modification

2023-04-05 11:03:21+02:00 5241 vmtoolsd 0.0
2023-04-05 11:03:23+02:00 5241 vmtoolsd 0.0
2023-04-05 11:03:25+02:00 5241 vmtoolsd 0.0
2023-04-05 11:03:27+02:00 5241 vmtoolsd 93.3
2023-04-05 11:03:30+02:00 5241 vmtoolsd 93.3
2023-04-05 11:03:32+02:00 5241 vmtoolsd 100.0
2023-04-05 11:03:34+02:00 5241 vmtoolsd 93.3
2023-04-05 11:03:36+02:00 5241 vmtoolsd 93.3
2023-04-05 11:03:38+02:00 5241 vmtoolsd 93.3
2023-04-05 11:03:41+02:00 5241 vmtoolsd 0.0
2023-04-05 11:03:43+02:00 5241 vmtoolsd 0.0

With modification

023-04-05 13:05:58+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:00+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:02+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:04+02:00 5241 vmtoolsd 93.3
2023-04-05 13:06:07+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:09+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:11+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:13+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:15+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:18+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:20+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:22+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:24+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:26+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:28+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:30+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:33+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:35+02:00 5241 vmtoolsd 93.3
2023-04-05 13:06:37+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:39+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:41+02:00 5241 vmtoolsd 0.0
2023-04-05 13:06:43+02:00 5241 vmtoolsd 0.0

Tool.conf added line poll-interval=0

/etc/vmware-tools> cat tools.conf
[guestinfo]
disable-perf-mon=1
poll-interval=0

alexvancasper avatar Apr 12 '23 07:04 alexvancasper

@alexvancasper

Apologies for the response delay, it has been a hectic week.

The process information that you provided is consistent with the guestInfo query for routing information on a large quanitu of IPv6 network devices.

You second sample shows the the "poll-interval = 0" has avoided the routing info for IPv6 devices. The single duration CPU spike every 30 seconds is the check to locate the primary IP for the VM. It will run through every network device.

As @lemke1458 indicated, the routing information cpu spike was addressed in VMware Tools/open-vm-tools 10.3.10.

Begining in that release and later, two switches are available - "max-ipv4-routes" and "max-ipv6-routes" that can be configured to limit the number of devices to be queried.

Open-vm-tools 10.2.0 (and later) provides a switch in tools.conf - "primary-nics" which allow one to specify a comma separated list of interface names to be considered as primary / important.

WindRiver 8.0 amd tools 11.0.5 are quite old.

Is an upgrade to either possible?

  • Later version of Wind River & will that provide a later version of tools.

Is you vmtools based on open-vm-tools or the VMware (tar) Tools 11.0.5 (linux.iso) provided by VMware.

  • If open-vm-tools, is there a later update of open-vm-tools available from the WR Linux Secure community?
  • If VMware (tar) Tools, you can get later releases of 10.3.20 - 10.3.25 from the VMware download site. - check out the 10.3 series or VMware Tools Release Notes at https://docs.vmware.com/en/VMware-Tools/index.html - pay close attention to limitations on the minimum version of glibc required as noted in the RN. - Be certain to experiment on a clone of your WRLS VM.

johnwvmw avatar Apr 17 '23 02:04 johnwvmw

If some upgrade is unavailable, the Primary IP guestInfo spike may be made less frequent by adding a "tools.ipCheckInterval" setting that can lengthen the default 30 second interval.

Add tools.ipCheckInterval = "some large number of seconds" to the .vmx file on the host.

Again, suggest that you test the change with a cloned VM.

johnwvmw avatar Apr 17 '23 02:04 johnwvmw