toybox
toybox copied to clipboard
Total CPU consumption in top logs i.e goes beyond 800% for octacore
Top command code used : http://www.aospxref.com/android-12.0.0_r3/xref/external/toybox/toys/posix/ cmnd used : adb shell top As per documentation, cpu field indicate the maximum load device can take i.e 800% for octacore. So ideally summation of rest all component should remain within 800%. i.e cpu = user+nice+sys+idle +iow+irq+host
But In our stability testing under stress scenarios , we are seeing cpu < (user+nice+sys+idle +iow+irq+host) in top logs
But sometime we see single filed i.e sys% Or idle% or user+sys itself crosses beyond 1000 or 2000 .
Tasks: 761 total, 12 running, 735 sleeping, 0 stopped, 14 zombie Mem: 11258M total, 9158M used, 2100M free, 8M buffers Swap: 4095M total, 549M used, 3546M free, 2177M cached 800%cpu 198%user 56%nice 1615%sys 1193%idle 1%iow 94%irq 30%sirq 0%host PID PPID USER PR NI[%CPU]S VIRT RES PCY CMD NAME 5517 1 root 20 0 125 S 13G 15M fg adbd adbd 1718 1027 system 18 -2 103 S 20G 635M fg system_server system_server 13007 7617 root 20 0 44.6 R 12G 4.7M fg top top 31741 1030 u0_a212 30 10 36.1 R 2.0G 188M fg com.jifen.qukan com.jifen.qukan 4249 1027 u0_a167 20 0 27.6 S 16G 128M fg rs.media.module com.android.providers.media.module 2093 2 root 20 0 26.5 R 0 0 fg kworker/u16:3+reverse_migrate_wq [kworker/u16:3+reverse_migrate_wq] 13194 1030 u0_a245 16 -4 24.8 S 2.2G 257M fg id.article.news com.ss.android.article.news
Want to understand how does this calculation made
- Is cpu = user+nice+sys+idle +iow+irq+host is true? If yes then please explain reason of one or more component having value way grater than 800.
- What should be the ideal value of cpu fields when device is not considered as stressed
- How is it possible for any process to acquire more than 100% CPU. What can we say an ideal value of CPU that any process can hold without being considered as its hogging cpu
2022-04-08 02:56:50
Tasks: 709 total, 7 running, 702 sleeping, 0 stopped, 0 zombie Mem: 11258M total, 8288M used, 2969M free, 6M buffers Swap: 4095M total, 294M used, 3801M free, 2364M cached 800%cpu 245%user 32%nice 2021%sys 1644%idle 0%iow 111%irq 34%sirq 0%host PID PPID USER PR NI[%CPU]S VIRT RES PCY CMD NAME 19331 18258 system 18 -2 146 S 20G 635M fg system_server system_server 1571 1 system 20 0 129 S 12G 2.1M fg subsystem_ramdu subsystem_ramdump 29948 18066 u0_a262 10 -10 86.3 R 1.8G 127M ta arme.gamecenter com.wepie.snake.nearme.gamecenter 30243 30239 root 20 0 84.9 R 12G 6.0M fg procrank procrank
800%cpu 141%user 14%nice 728%sys 130%idle 0%iow 37%irq 10%sirq 0%host
[7m PID USER PR NI VIRT RES SHR S[%CPU] %MEM TIME+ ARGS [0m
1697 system 18 -2 20G 702M 299M S 100 6.1 133:37.81 system_server
[1m21222 root 20 0 12G 7.2M 2.4M R 69.0 0.0 0:05.91 procrank
Have pulled /proc/stat from device at same time when user+ sys was more than 800 for octacore. cpu 1122738 284058 6935523 13260 2419 444351 140042 0 0 0 cpu0 99078 33191 879297 2438 97 84680 18253 0 0 0 cpu1 100900 28167 889897 2958 133 74481 20293 0 0 0 cpu2 96370 28090 900556 2963 146 72313 16488 0 0 0 cpu3 93710 27962 909937 2976 153 65942 16357 0 0 0 cpu4 160825 33700 858152 1294 617 36466 27863 0 0 0 cpu5 185114 44477 837796 127 551 38311 11037 0 0 0 cpu6 184217 45077 839383 136 561 37741 10352 0 0 0 cpu7 202522 43390 820501 364 157 34413 19397 0 0 0 intr 85522543 0 20637409 8323982 0 0 0 6534023 0 0 0 0 28863496 0 0 0 0 843 435 0 0 0 2152092 48256 5977 1150562 492223 489655 289475 128098 149580 2 0 0 0 0 0 88 1744 160382 89258 0 0 0 444700 0 0 98540 45043 43746 89472 97927 0 10990 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 5963573 10 191 3928 1882 0 0 0 0 0 0 0 0 0 0 70380 0 0 184931 0 0 0 0 0 0 0 0 0 0 0 0 231 164554 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 77938 0 0 16567 0 22 0 6 0 4 0 0 254 855 28 0 0 8 0 0 66 2280 86 468 0 0 0 0 424 0 0 0 78 295542 111 0 0 10 0 240 0 0 0 0 0 0 0 0 0 0 0 0 101 0 0 0 0 0 963 6 0 0 20944 4 3 20378 7402 0 0 0 0 0 0 0 45 75 144 220 220 0 0 1 1 0 0 1 1 0 0 0 1 1 0 0 0 2 0 0 0 6963346 0 2210 29678 8776 20374 0 0 1265930 0 468 475 817 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45337 1 0 0 0 0 8 1943 ctxt 145951240 btime 1652860601 processes 786596 procs_running 51 procs_blocked 0 softirq 24546204 165397 8919978 2802 379318 7550233 0 540192 1533916 36 5454332
Those numbers count up monotonically until the system is rebooted (it's total accumulated ticks in each category). Top diffs two readings, subtracting the old one from the new one to see what changed, so I can't tell much from one reading in isolation.
fwiw, since you asked the question out loud on your blog, here's toybox top on an idle 128-core machine:
~/toybox$ ./toybox nproc
128
~/toybox$ ./toybox top
Tasks: 1049 total, 1 running, 1048 sleeping, 0 stopped, 0 zombie
Mem: 515961M total, 482873M used, 33088M free, 4544M buffers
Swap: 1907M total, 105M used, 1802M free, 454053M cached
12800%cpu 3%user 1%nice 7%sys 12789%idle 0%iow 0%irq 0%sirq
amusingly, given my usual complaint about "MiB are no longer a useful unit on high-end phones, let alone high-end workstations", procps top is actually worse --- i really don't need the extra decimal place making the ridiculously large number of megabytes even harder to read :-)
top - 18:36:54 up 3 days, 18:05, 1 user, load average: 0.23, 0.19, 0.13
Tasks: 1048 total, 1 running, 1047 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.2 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 515961.4 total, 33105.5 free, 6278.8 used, 476577.1 buff/cache
MiB Swap: 1908.0 total, 1802.5 free, 105.5 used. 506242.9 avail Mem
I can make the system auto-adjust to show gigabytes, but where's the cutoff? (More than 64 gigs? Aesthetic issues do not have empirical rules.)
And at least that system doesn't add up to more than 100% CPU usage. Possibly something is getting counted twice and one or more other columns need to be subtracted from "system", but I have yet to reproduce the issue locally so I can examine it...
(sorry for derailing this thread, but, yeah, that's the hard part ... it seems obvious and unobjectionable to me that "four digits of megabytes is one too many", which fixes this case, but i think we've always struggled to agree on whether a 12 GiB device should use GiB or MiB, because 100MiB out of 1GiB is logarithmically more than 100MiB out of 10GiB, if you see what i mean [and i think you do, because you're the one who objects that low four-digit MiB should still be in MiB]. personally i have a fondness for 4GiB being the "okay, now it's gigabytes", if only because that's where Android's "low memory vs high memory" split currently is. fwiw, a 32 GiB cutoff would get all my desktops, 16 GiB all my laptops, 6 GiB all my high-end phones.)
Thanks Landley for your inputs. And at least that system doesn't add up to more than 100% CPU usage. Possibly something is getting counted twice and one or more other columns need to be subtracted from "system", but I have yet to reproduce the issue locally so I can examine it...
So this seems to be a bug here ? W.r.t to reproducibility we are running heavy stress testing for long hours