tallow icon indicating copy to clipboard operation
tallow copied to clipboard

tallow eating 100% of 1 CPU thread

Open grahamwhaley opened this issue 5 years ago • 9 comments

I noticed in top that tallow was consuming 100% of a cpu thread.

A gdb attach shows its stack as:

(gdb) where
#0  0x00007fa3ae02fd4c in ?? () from /usr/lib64/libsystemd.so.0
#1  0x00007fa3ae0301be in ?? () from /usr/lib64/libsystemd.so.0
#2  0x00007fa3ae03e057 in sd_journal_get_data () from /usr/lib64/libsystemd.so.0
#3  0x0000555ba651b775 in ?? ()
#4  0x00007fa3ae1472c3 in __libc_start_main () from /usr/lib64/haswell/libc.so.6
#5  0x0000555ba651baee in ?? ()

A continue/stop then showed it as:

(gdb) where
#0  0x00007fa3ae0ca578 in ?? () from /usr/lib64/libpcre.so.1
#1  0x00007fa3ae0db08b in pcre_exec () from /usr/lib64/libpcre.so.1
#2  0x0000555ba651b801 in ?? ()
#3  0x00007fa3ae1472c3 in __libc_start_main () from /usr/lib64/haswell/libc.so.6
#4  0x0000555ba651baee in ?? ()

The tallow journal looks like:

 # journalctl -u tallow
-- Logs begin at Thu 2019-10-31 10:18:08 GMT, end at Fri 2019-11-01 17:28:18 GMT. --
Oct 31 13:57:04 skull tallow[216312]: Journal was rotated, resetting
Oct 31 18:00:40 skull systemd[1]: Stopping Tallow Service...
Oct 31 18:00:40 skull systemd[1]: tallow.service: Succeeded.
Oct 31 18:00:40 skull systemd[1]: Stopped Tallow Service.
-- Reboot --
Nov 01 09:53:49 skull systemd[1]: Started Tallow Service.
Nov 01 09:53:49 skull tallow[397]: /usr/share/tallow/sshd.json: 10 patterns
Nov 01 09:53:49 skull tallow[397]: Skipped reading /etc/tallow: No such file or directory
Nov 01 09:53:49 skull tallow[397]: Loaded 10 patterns total
Nov 01 09:53:49 skull tallow[397]: tallow 18 Started
Nov 01 10:06:34 skull tallow[397]: Journal was rotated, resetting
Nov 01 10:46:00 skull systemd[1]: Stopping Tallow Service...
Nov 01 10:46:00 skull systemd[1]: tallow.service: Succeeded.
Nov 01 10:46:00 skull systemd[1]: Stopped Tallow Service.
Nov 01 10:46:00 skull systemd[1]: Started Tallow Service.
Nov 01 10:46:00 skull tallow[134447]: /usr/share/tallow/sshd.json: 10 patterns
Nov 01 10:46:00 skull tallow[134447]: Skipped reading /etc/tallow: No such file or directory
Nov 01 10:46:00 skull tallow[134447]: Loaded 10 patterns total
Nov 01 10:46:00 skull tallow[134447]: tallow 18 Started
Nov 01 14:44:35 skull tallow[134447]: Journal was rotated, resetting

The only 'interesting' thing on this machine is that it is running a single node k8s cluster.

The machine is:

# cat /etc/os-release
NAME="Clear Linux OS"
VERSION=1
ID=clear-linux-os
ID_LIKE=clear-linux-os
VERSION_ID=31460
PRETTY_NAME="Clear Linux OS"
ANSI_COLOR="1;35"
HOME_URL="https://clearlinux.org"
SUPPORT_URL="https://clearlinux.org"
BUG_REPORT_URL="mailto:[email protected]"
PRIVACY_POLICY_URL="http://www.intel.com/privacy"

grahamwhaley avatar Nov 01 '19 17:11 grahamwhaley

What happens when you kill tallow? -Does another process rise in CPU? -Does idle CPU increase? -What does your process tree look like when this occurs?

chuckn408 avatar Feb 18 '20 09:02 chuckn408

Oh, that was months ago ;-), and I don't think I've (knowingly) seen it since. iirc, we I killed/restarted tallow, and it behaved again. I think we considered if it was something to do with log rotation/wrap at the time - maybe @ahkok remembers or has some further ideas...

grahamwhaley avatar Feb 18 '20 09:02 grahamwhaley

I myself encountered the issue a few months back, which is why I'm keeping this open. I have not yet determined whether this bug is gone now (e.g. due to some of the recent large changes) or not

ahkok avatar Feb 18 '20 16:02 ahkok

Hi. I came across the exact same issue as @grahamwhaley did. The problem was solved by killing and restarting tallow. I have noticed it will happen when you are in a SSH-session for too long (over 30 hours (Don't ask me why I was in a session for so long)). This issue is still a problem on the newest Clear Linux OS, and I have seen the error on my system multiple times now. it would be really nice if you could take a look at it ;-)

My machine:

NAME="Clear Linux OS" VERSION=1 ID=clear-linux-os ID_LIKE=clear-linux-os VERSION_ID=32820 PRETTY_NAME="Clear Linux OS" ANSI_COLOR="1;35" HOME_URL="https://clearlinux.org" SUPPORT_URL="https://clearlinux.org" BUG_REPORT_URL="mailto:[email protected]" PRIVACY_POLICY_URL="http://www.intel.com/privacy" BUILD_ID=32820

NicolaiThagaard avatar Apr 12 '20 21:04 NicolaiThagaard

I second that in 2022 on Clear Linux, so the symptom is tallow restarts who knows why, and gets 100% on its little thread. Thank God CPUs have more cores nowadays! :)

gybfefe avatar Jan 18 '22 12:01 gybfefe

Not good

image

Also not good

image

Not good as well (average is about 77 celsius), intel nuc i7.

image

EDIT: Intel NUC server at home, no firewalls, no access from the outside (only 80/443 ports for obvious reasons).

erkexzcx avatar Feb 11 '22 14:02 erkexzcx

Same issue here, noticed performance was a bit off on a huge batch job in Ruby that I run frequently, and turns out tallow was using an entire core. This is on a VPS that I access over SSH. I had been running the job during the past ~72 hours, but I recently rebooted the VPS and started the job again, this is the first time I've noticed tallow runaway. But I have noticed performance inconsistencies in the past too, so I'll keep an eye on it.

geckolinux avatar Jan 19 '23 14:01 geckolinux

I also noticed tallow eating up a core's worth of CPU. Sent a SIGTERM to the process, and when it restarted is was using "normal" resources. image


Information (reduced):

$ cat /etc/os-release
NAME="Clear Linux OS"
VERSION=1
ID=clear-linux-os
VERSION_ID=39050
BUILD_ID=39050
$ lscpu
Architecture:           x86_64
  CPU op-mode(s):       32-bit, 64-bit
  Address sizes:        39 bits physical, 48 bits virtual
  Byte Order:           Little Endian
CPU(s):                 2
  On-line CPU(s) list:  0,1
Vendor ID:              GenuineIntel
  Model name:           Intel(R) Celeron(R) CPU G3930 @ 2.90GHz
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          1
    Stepping:           9

Renegade-Master avatar May 06 '23 20:05 Renegade-Master

I just observed this as well

$ cat /etc/os-release 
NAME="Clear Linux OS"
VERSION=1
ID=clear-linux-os
ID_LIKE=clear-linux-os
VERSION_ID=39930
PRETTY_NAME="Clear Linux OS"
ANSI_COLOR="1;35"
HOME_URL="https://clearlinux.org"
SUPPORT_URL="https://clearlinux.org"
BUG_REPORT_URL="mailto:[email protected]"
PRIVACY_POLICY_URL="http://www.intel.com/privacy"
BUILD_ID=39930
$ lscpu 
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  36
  On-line CPU(s) list:   0-35
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  18
    Socket(s):           1
    Stepping:            7
    CPU(s) scaling MHz:  72%
    CPU max MHz:         4800.0000
    CPU min MHz:         1200.0000

chriselrod avatar Sep 10 '23 23:09 chriselrod