meta-balena icon indicating copy to clipboard operation
meta-balena copied to clipboard

Investigate cpu performance

Open ZubairLK opened this issue 6 years ago • 9 comments

It would be nice to investigate cpu/io/mem/network performance and then compare it to standard distributions on devices. There might be overheads we have that we can reduce.

ZubairLK avatar Feb 18 '19 16:02 ZubairLK

pi0w, no app container, 2.29.2

even after 5 minutes, the supervisor and balena daemon seem to be doing something

Mem: 254004K used, 237460K free, 3864K shrd, 19484K buff, 100412K cached
CPU:  87% usr  12% sys   0% nic   0% idle   0% io   0% irq   0% sirq
Load average: 2.07 2.24 1.62 2/143 1510
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 1081  1069 root     R     154m  32%  62% node /usr/src/app/dist/app.js
  782     1 root     S     873m 182%  25% /usr/bin/balenad --experimental --log-driver=journald -s aufs -H fd:// -H unix:///var/run/balena.sock -H unix:///var/run/balena-engine.sock -H
  647     1 root     S     9868   2%   5% @sbin/plymouthd --tty=tty1 --mode=boot --pid-file=/run/plymouth/pid --attach-to-session --kernel-command-line=plymouth.ignore-serial-consoles s
 1487   666 root     R     2992   1%   4% top
  780     1 root     S     857m 178%   1% /usr/bin/balenad --delta-data-root=/mnt/sysroot/active/balena --delta-storage-driver=aufs --log-driver=journald -s aufs --data-root=/mnt/sysroo
  848   782 root     S     856m 178%   1% balena-engine-containerd --config /var/run/balena-engine/containerd/containerd.toml

After pushing a simple raspbian container that sleeps, cpu usage has settled down. plymouthd seems to be eating some unnecessary cpu cycles.

Mem: 400416K used, 91048K free, 3960K shrd, 30104K buff, 221028K cached
CPU:   4% usr   4% sys   0% nic  90% idle   0% io   0% irq   0% sirq
Load average: 2.55 3.25 2.58 1/154 1831
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
  647     1 root     S     9868   2%   4% @sbin/plymouthd --tty=tty1 --mode=boot --pid-file=/run/plymouth/pid --attach-to-session --kernel-command-line=plymouth.ignore-serial-consoles s
 1831   666 root     R     2992   1%   2% top
  782     1 root     S     887m 185%   1% /usr/bin/balenad --experimental --log-driver=journald -s aufs -H fd:// -H unix:///var/run/balena.sock -H unix:///var/run/balena-engine.sock -H
  848   782 root     S     865m 180%   1% balena-engine-containerd --config /var/run/balena-engine/containerd/containerd.toml
  842   780 root     S     865m 180%   1% balena-engine-containerd --config /var/run/balena-host/containerd/containerd.toml
  780     1 root     S     857m 178%   1% /usr/bin/balenad --delta-data-root=/mnt/sysroot/active/balena --delta-storage-driver=aufs --log-driver=journald -s aufs --data-root=/mnt/sysroo
 1081  1069 root     S     140m  29%   1% node /usr/src/app/dist/app.js
 1486     2 root     IW       0   0%   1% [kworker/0:0]
 1376     2 root     IW       0   0%   1% [kworker/0:3]

Inside a rpi-raspbian container, after a apt-get update && apt-get install sysbench

root@62d0b9a:/usr/src/app# sysbench --test=cpu run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          264.7861s
    total number of events:              10000
    total time taken by event execution: 264.7184
    per-request statistics:
         min:                                 22.91ms
         avg:                                 26.47ms
         max:                                121.75ms
         approx.  95 percentile:              38.18ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   264.7184/0.00

root@62d0b9a:/usr/src/app# 

On raspbian lite,

pi@raspberrypi:~$ uname -a
Linux raspberrypi 4.14.79+ #1159 Sun Nov 4 17:28:08 GMT 2018 armv6l GNU/Linux
pi@raspberrypi:~$ cat /etc/rpi-issue 
Raspberry Pi reference 2018-11-13
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 7e0c786c641ba15990b5662f092c106beed40c9f, stage2

pi@raspberrypi:~$ sysbench --test=cpu run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          228.4990s
    total number of events:              10000
    total time taken by event execution: 228.4667
    per-request statistics:
         min:                                 22.76ms
         avg:                                 22.85ms
         max:                                 33.24ms
         approx.  95 percentile:              22.96ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   228.4667/0.00

ZubairLK avatar Feb 18 '19 16:02 ZubairLK

So cpu load is unusually high on my pi0 on balenaOS 15:09:46 up 0:15, 1 user, load average: 0.78, 0.84, 0.66

Its sensible on pi0 on raspbian lite. 15:16:54 up 21:29, 2 users, load average: 0.07, 0.19, 0.16 I have a docker container running a sleep on both.

I'm going to try and bring down various services in the os to see what is causing it.

ZubairLK avatar Feb 19 '19 15:02 ZubairLK

note to self: check out health monitoring using netdata.

ZubairLK avatar Feb 22 '19 12:02 ZubairLK

[jakogut] This issue has attached support thread https://jel.ly.fish/d8c3be4e-68a1-4943-9a8a-e27f9e5bd26d

jellyfish-bot avatar Oct 13 '21 16:10 jellyfish-bot

I investigated a little bit and the high initial cpu usage (at least on my system - RPI3) originates from rngd.

Further investigation yielded that for some reason the jitter source was used, even when the hardware randomness source hwrng is available. To solve this issue one could modify the command to run rngd from /usr/sbin/rngd -f -r /dev/hwrng from /usr/sbin/rngd -f -r /dev/hwrng -x jitter on raspberry pi systems.

Sadly I know exactly nothing about yocto, so I cannot implement this myself

Tom-Julux avatar Aug 08 '22 18:08 Tom-Julux

hi @Tom-Julux what I think happens is that all entropy sources need to be initialized before the best performing (hardware) engine is chosen. So the jitter source even if it won't be used also has to be initialized (unless you pass the -x jitter) option, even if it won't be used as there is a hardware entropy source available. I will look into adding the -x entropy to device types that have a hardware entropy source to increase boot time.

alexgg avatar Aug 31 '22 15:08 alexgg

[cywang117] This has attached https://jel.ly.fish/a13bcce9-7878-465c-940f-cd12b1d27a37

jellyfish-bot avatar Oct 13 '22 18:10 jellyfish-bot

It might be worth pulling a new version of rngd, too - my pi zeros still have 6.15 despite being on the latest production BalenaOS, 6.16 changelog includes "Fix jitterentropy long timeout failures on low power hardware,"

srd424 avatar Mar 11 '24 14:03 srd424

hey @srd424 the version of rngd comes from Poky Kirkstone - we will update as part of updating to the new Scarthgap LTS release which is scheduled to release in April.

alexgg avatar Mar 11 '24 14:03 alexgg