docker icon indicating copy to clipboard operation
docker copied to clipboard

unstable: high cpu usage by /sbin/urngd

Open lePereT opened this issue 5 years ago • 14 comments

Hi all, getting a lot of instability. On MacOS Mojave, running Docker version 19.03.8, and docker-machine version 0.16.2

If I just use the Readme command:

docker run --rm -it openwrtorg/rootfs

I get a number of error messages during launch:

rich$ docker run --rm -it openwrtorg/rootfs
Failed to resize receive buffer: Operation not permitted
ip: RTNETLINK answers: Operation not permitted
Press the [f] key and hit [enter] to enter failsafe mode
Press the [1], [2], [3] or [4] key and hit [enter] to select the debug level
ip: can't send flush request: Operation not permitted
ip: SIOCSIFFLAGS: Operation not permitted
Please press Enter to activate this console.

When in the shell, it's sluggish, and I notice that one core of my CPU is being used at 100%. A top inside the container reveals the following:

Mem: 433964K used, 579256K free, 290552K shrd, 9536K buff, 323160K cached
CPU:  99% usr   0% sys   0% nic   0% idle   0% io   0% irq   0% sirq
Load average: 0.99 0.58 0.24 2/163 817
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
   92     1 root     R      780   0% 100% /sbin/urngd
  279     1 root     S     1300   0%   0% /sbin/rpcd -s /var/run/ubus.sock -t 30
  434     1 root     S     1196   0%   0% /sbin/netifd
    1     0 root     S     1116   0%   0% /sbin/procd
   76     1 root     S     1084   0%   0% /bin/ash --login

Am I doing something wrong?

lePereT avatar Apr 12 '20 18:04 lePereT

Thanks for the report, I've never touched urngd but maybe @ynezz has a clue...

aparcar avatar Apr 13 '20 08:04 aparcar

So, quickly typing a killall /sbin/urngd after terminal access is gained appears to make urngd behave. Not ideal. Also what are the following error messages all about:

Failed to resize receive buffer: Operation not permitted
ip: RTNETLINK answers: Operation not permitted
...
ip: can't send flush request: Operation not permitted
ip: SIOCSIFFLAGS: Operation not permitted

lePereT avatar Apr 13 '20 11:04 lePereT

Just to confirm that the problem persists with an Ubuntu 18.04 VM as host

Mem: 865520K used, 143284K free, 984K shrd, 34440K buff, 579772K cached
CPU:  99% usr   0% sys   0% nic   0% idle   0% io   0% irq   0% sirq
Load average: 0.39 0.11 0.04 4/154 711
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
   91     1 root     R      776   0%  99% /sbin/urngd
  444     1 root     S     1208   0%   0% /sbin/netifd
    1     0 root     S     1176   0%   0% /sbin/procd

lePereT avatar Apr 13 '20 12:04 lePereT

I can't reproduce the error, did you tried to reproduce it on other machines?

aparcar avatar Apr 28 '20 22:04 aparcar

I can't reproduce that error even on Ubuntu 18.04 (but with 5.6.7 kernel). It would help to get strace output from urngd if its in this state, should be as easy as running opkg update; opkg install strace; strace --no-abbrev --attach $(pidof urngd) inside container spawn with docker run --cap-add SYS_PTRACE --rm -it openwrtorg/rootfs

ynezz avatar May 06 '20 05:05 ynezz

i'll attempt to do this in the next week or so. i'll close the issue for now to prevent noise :) thanks for both your responses

lePereT avatar May 06 '20 18:05 lePereT

I would like to reopen this issue.

I am running in the same bug when OpenWRT is running in a docker that does not allow ioctl RNDADDENTROPY on /dev/random.

This causes an infinite loop consuming high cpu because the WRITE poll event keeps triggering and is never satisfied (because it cannot), thus causing the infinite busy loop.

Should I provide a possible fix? I would simply stop the polling for a certain amount of time in case RNDADDENTROPY fails.

thg2k avatar Jan 30 '21 11:01 thg2k

I have the same issue in Ubuntu18.04 VM, and OpenWRT(19.07.02) in the docker container.

databill avatar Feb 07 '21 08:02 databill

@thg2k please provide a fix

aparcar avatar Feb 07 '21 08:02 aparcar

@aparcar I did, but it was refused by the maintainer.

http://lists.openwrt.org/pipermail/openwrt-devel/2021-January/033587.html

It is indeed a very bad workaround but it solves the problem without causing any regression damage and it's easy to audit. A better fix would be to use uloop timers and improve logging but I have no interest in spending more time on this. It is still a fix and I recommend merging it.

thg2k avatar Feb 07 '21 09:02 thg2k

I got this problem on my MT7621 router too, maybe there is something wrong with the source code.

cyijun avatar Jun 20 '21 04:06 cyijun

I ran into this same problem when using PVE to run OpenWrt in Linux Container, according to random(4) - Linux manual page, The CAP_SYS_ADMIN capability is required for almost all related ioctl requests.

I had included the default OpenWrt config file (same as this lxc-template) which contains lxc.cap.drop = sys_admin, I removed this line and the /sbin/urngd not stuck my CPU anymore.

I think there is also a way to grant the SYS_ADMIN capability to a Docker container, but it is overloaded so the decision is yours.

Moreover, it seems just uninstall the urngd package could also solve this problem but I'm not sure the side effect.

Haizs avatar Feb 26 '22 08:02 Haizs

I ran into this problem today on a Linksys WRT1900ACS which has an uptime of 248 days running

~# cat /etc/openwrt_release 
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='21.02.0'
DISTRIB_REVISION='r16279-5cc0535800'
DISTRIB_TARGET='mvebu/cortexa9'
DISTRIB_ARCH='arm_cortex-a9_vfpv3-d16'
DISTRIB_DESCRIPTION='OpenWrt 21.02.0 r16279-5cc0535800'
DISTRIB_TAINTS=''

Suddenly at around 1am my load jumped. Screenshot_2022-06-27_11-45-53

Killing urngd helped. But restarting it brought the load back up again. So, now I've killed urngd without restarting it. I will keep the system up to see if there are any impacts of having urngd stopped.

What, by the way, could be using urngd? Maybe those processes just need a restart. Perhaps dnsmasq? Anything else? Does OLSRd or babeld use urngd?

pmelange avatar Jun 27 '22 09:06 pmelange

It looks like I am also seeing this on a TP-Link Archer C7 v2.

root@foobar:~# cat /etc/openwrt_release
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='19.07.2'
DISTRIB_REVISION='r10947-65030d81f3'
DISTRIB_TARGET='ar71xx/generic'
DISTRIB_ARCH='mips_24kc'
DISTRIB_DESCRIPTION='OpenWrt 19.07.2 r10947-65030d81f3'
DISTRIB_TAINTS=''

image

bantu avatar Sep 15 '22 21:09 bantu