psutil icon indicating copy to clipboard operation
psutil copied to clipboard

[RPi OS] Some functions take multiple seconds to return

Open Renaud11232 opened this issue 2 years ago • 3 comments

Summary

Hello,

I'm the developper of an OctoPrint plugin (ResourceMonitor) and I have gotten a lot of reports from users saying that my plugin is causing freezes on their 3D printer. I think the issue is caused by psutil taking multiple seconds to return results in some cases. I'm also sorry since if I cannot give some information on this issue because I am not able to reproduce it on my Octoprint install, this is based of information I received from users. ie: I cannot be accurate regarding the psutil version used. All I know is that it's >=5.8,<6 since that is what is set as a dependency of OctoPrint.

  • OS: RPi OS (OctoPi 0.17.0)
  • Architecture: ARM (Raspberry Pi 3 B rev1.2)
  • Psutil version: >=5.8,<6
  • Python version: 3.7
  • Type: performance

Description

I've had users reporting an issue where my plugin is causing freezes with their 3D printer only when my plugin is installed, resulting in failed/ugly prints. After looking at the logs I received, I noticed that some psutil functions take multiple seconds to return. My plugin uses psutil in a separate thread, but since psutil is calling some native functions, I think that it can prevent other threads from running because of the GIL. But I cannot be 100% certain since I don't get the same issue on my setup.

Here is the output of the equivalent of the print_api_speed script I added in my plugin :

SYSTEM APIS               SECONDS
----------------------------------
sensors_battery           0.000052
cpu_count                 0.000121
disk_usage                0.000142
users                     0.000347
cpu_times                 0.000494
pid_exists                0.000557
cpu_stats                 0.000581
cpu_percent               0.000634
cpu_times_percent         0.000745
boot_time                 0.000834
virtual_memory            0.000864
cpu_count (cores)         0.001160
net_if_addrs              0.001323
net_io_counters           0.001329
swap_memory               0.001371
net_if_stats              0.001451
pids                      0.001511
sensors_fans              0.002179
sensors_temperatures      0.002423
cpu_freq                  0.002627
disk_partitions           0.002676
disk_io_counters          0.004596
process_iter (all)        0.021666
net_connections           0.027797

PROCESS APIS              SECONDS
----------------------------------
create_time               0.000006
nice                      0.000024
cpu_affinity              0.000045
exe                       0.000053
ionice                    0.000063
cwd                       0.000078
num_threads               0.000157
status                    0.000176
num_fds                   0.000189
io_counters               0.000190
num_ctx_switches          0.000214
uids                      0.000227
gids                      0.000228
ppid                      0.000230
cpu_percent               0.000239
cpu_num                   0.000246
cpu_times                 0.000248
is_running                0.000269
cmdline                   0.000278
environ                   0.000286
name                      0.000293
username                  0.000324
memory_info               0.000344
memory_percent            0.000427
open_files                0.002762
connections               0.004220
terminal                  0.004986
threads                   0.005430
children                  0.013436
memory_full_info          0.029548
memory_maps               0.140885

But in some cases, the functions are hanging. Here is an example from the logs of my plugin where net_if_stats took ~1.2s to return :

[...]
2022-04-15 08:10:37,063 - octoprint.plugins.resource_monitor - DEBUG - disk_usage(partition["mountpoint"] : sdiskusage(total=264289280, used=50512896, free=213776384, percent=19.1)
2022-04-15 08:10:37,067 - octoprint.plugins.resource_monitor - DEBUG - net_io_counters(pernic=True) : {'lo': snetio(bytes_sent=205795016, bytes_recv=205795016, packets_sent=224664, packets_recv=224664, errin=0, errout=0, dropin=0, dropout=0), 'eth0': snetio(bytes_sent=0, bytes_recv=0, packets_sent=0, packets_recv=0, errin=0, errout=0, dropin=0, dropout=0), 'wlan0': snetio(bytes_sent=188063593, bytes_recv=27125971, packets_sent=195762, packets_recv=128308, errin=0, errout=0, dropin=0, dropout=0)}
2022-04-15 08:10:37,070 - octoprint.plugins.resource_monitor - DEBUG - net_if_addrs() : {'lo': [snicaddr(family=<AddressFamily.AF_INET: 2>, address='127.0.0.1', netmask='255.0.0.0', broadcast=None, ptp=None), snicaddr(family=<AddressFamily.AF_INET6: 10>, address='::1', netmask='ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff', broadcast=None, ptp=None), snicaddr(family=<AddressFamily.AF_PACKET: 17>, address='00:00:00:00:00:00', netmask=None, broadcast=None, ptp=None)], 'wlan0': [snicaddr(family=<AddressFamily.AF_INET: 2>, address='192.168.8.111', netmask='255.255.255.0', broadcast='192.168.8.255', ptp=None), snicaddr(family=<AddressFamily.AF_INET6: 10>, address='fe80::a3d6:e757:e912:2c14%wlan0', netmask='ffff:ffff:ffff:ffff::', broadcast=None, ptp=None), snicaddr(family=<AddressFamily.AF_PACKET: 17>, address='b8:27:eb:fd:db:2d', netmask=None, broadcast='ff:ff:ff:ff:ff:ff', ptp=None)], 'eth0': [snicaddr(family=<AddressFamily.AF_PACKET: 17>, address='b8:27:eb:a8:8e:78', netmask=None, broadcast='ff:ff:ff:ff:ff:ff', ptp=None)]}
2022-04-15 08:10:38,238 - octoprint.plugins.resource_monitor - DEBUG - net_if_stats() : {'lo': snicstats(isup=True, duplex=<NicDuplex.NIC_DUPLEX_UNKNOWN: 0>, speed=0, mtu=65536), 'eth0': snicstats(isup=False, duplex=<NicDuplex.NIC_DUPLEX_UNKNOWN: 0>, speed=65535, mtu=1500), 'wlan0': snicstats(isup=True, duplex=<NicDuplex.NIC_DUPLEX_UNKNOWN: 0>, speed=0, mtu=1500)}
2022-04-15 08:10:38,242 - octoprint.plugins.resource_monitor - DEBUG - sensors_battery() : None
[...]

In order to get more information on this, I added an option to run everything through a profiler (cProfile). Here is the output from the profiler running every second : plugin_resource_monitor_profiling.log

Since there is so much data, I also made a small script that computes the average and maximum cumulative time for every function and displays them in decreasing order : output.txt

The get_all_resources() function is the the function from my plugin that calls all the other functions every second. You can find the code here if needed.

I hope this will help.

Thanks

Renaud11232 avatar Aug 27 '22 14:08 Renaud11232

I noticed the same yesterday, would be interested to find out why

Krokette avatar Sep 17 '22 18:09 Krokette

The get_all_resources() function is the the function from my plugin that calls all the other functions every second. You can find the code here if needed.

I see that your code uses psutil.disk_partitions. There is a known performance regression for it: https://github.com/giampaolo/psutil/issues/2109 Perhaps that's the culprit. Regression was introduced in 5.8.0, so you might try to install 5.7.X and see if the problem persists.

giampaolo avatar Sep 19 '22 19:09 giampaolo

Hello,

Unfortunately, I can't use versions prior to 5.8.0 since this is an OctoPrint dependency (>=5.8,<6 ) and I don't want my plugin to cause conflicts.

However I don't think this problem is linked to disk_partitions. Looking at the cProfile output, the longest it took for disk_partitions to return is 0.075s (0.003s average). I might be wrong but I don't see that being the issue here. I would guess that the issue comes from net_if_stats() which can take up to 1.537s to return (0.545s average). More specifically, the psutil._psutil_linux.net_if_duplex_speed native (max 1.506s, avg 0.540s)

Renaud11232 avatar Sep 20 '22 07:09 Renaud11232

@giampaolo there is a fix for #2109 - WDYT about merging https://github.com/giampaolo/psutil/pull/2110 ?

gozdal avatar Sep 26 '22 08:09 gozdal

@gozdal I think I'm going to remove maxfile / maxpath entirely in the next major version of psutil. Their usefulness is limited and I don't want to over complicate the API because of them.

giampaolo avatar Sep 27 '22 16:09 giampaolo

I did some digging, and possibly narrowed it down to something that changed between the raspberry pi 5.10.63 and 5.10.92 kernels. See above Issue 37 thread for details. (just in case that helps)

don-willingham avatar Oct 11 '22 01:10 don-willingham

maxfile and maxpath fields were remove from the namedtuple returned by disk_partitions() in https://github.com/giampaolo/psutil/pull/2405. This will be available in release 6.0.0.

giampaolo avatar Apr 17 '24 15:04 giampaolo