twirssi
twirssi copied to clipboard
Irssi uses 100% CPU time
Hello,
I have this issue where Irssi starts using 100% CPU. The time seems to correlate with the following messages from the Twirssi debug log:
03:21:08 'timeline' => '460572445892571136'03:21:08 } 03:21:08 }; 03:26:03 get_updates starting upd=1 03:26:03 bg_setup starting upd=1 03:26:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 0 03:26:13 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:18 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:23 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:28 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:33 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:38 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:43 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:48 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:53 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:58 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:03 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:13 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:18 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:23 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:28 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:33 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:38 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:43 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:48 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:53 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:58 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:28:03 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:28:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:28:08 Giving up on polling /tmp/tw_9366_Kbf6.done 03:31:03 get_updates starting upd=1 03:31:03 bg_setup starting upd=1 03:31:08 checking child log at /tmp/tw_9366_c3jr.done mtime=1398645063 v 0 03:31:13 checking child log at /tmp/tw_9366_c3jr.done no /tmp/tw_9366_c3jr.19390 v 1398645063 03:31:13 +063 teromiettinen@Twitter Loading home_timeline page 1 03:31:13 +064 page: $VAR1 = {};
How long have you been running twirssi, and how long has this problem been evident?
(heard a few, recent reports from others, on this subject)
are you on debian?
The problem has occurred for maybe a year (I have been using Twirssi for about 3 years). The issue starts after anywhere from a day to a month of Irssi uptime. I'm not sure if computer uptime makes a difference.
Yes, I've been using Debian stable.
Same here:
- first occurence of the problem could be a year ago
- problem occurs randomly - sometimes after weeks of irssi uptime, sometimes after a day. Lately, more a day than a month that's why I'm starting to look into the issue :-)
- Debian stable
I'm currently checking two things (at once :-)
- put the
waitpid( -1, WNOHANG)
into a loop like seen onperldoc -f waitpid
– if there are multiple dead childs, only one will be reaped at a time - remove the calls to Irssi::pidwait_remove() because the irssi documentation makes it sound like without this call children will be reaped automatically (quote from http://www.irssi.org/documentation/perl :
it is usually not necessary to call this function
Additional info: The child process seems to be stuck in an endless loop talking to twitter:
% strace -p 26734 2>&1 | head -10
Process 26734 attached - interrupt to quit read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
% lsof -n -p 26734 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME [...] irssi 26734 mitch 11u REG 0,18 0 676486626 /tmp/tw_8171_lX8K.26734 irssi 26734 mitch 12u IPv4 676487523 0t0 TCP 87.175.121.166:34436->199.16.156.199:https (CLOSE_WAIT) irssi 26734 mitch 13u IPv4 676488332 0t0 TCP 87.175.121.166:34465->199.16.156.199:https (ESTABLISHED) [...]
% dig -x 199.16.156.199 +nocomment
; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> -x 199.16.156.199 +nocomment ;; global options: +cmd ;199.156.16.199.in-addr.arpa. IN PTR 156.16.199.in-addr.arpa. 12 IN SOA ns1.p34.dynect.net. root.twitter.com. 7 3600 600 604800 60 ;; Query time: 0 msec ;; SERVER: 192.168.0.105#53(192.168.0.105) ;; WHEN: Fri Apr 25 19:37:25 2014 ;; MSG SIZE rcvd: 115
Is it bad if I just kill the process as a workaround?
You can safely kill the processes, but you might accumulate zombies that way. At least the 100% CPU should be gone. A workaround is incoming :-)
For my workaround see #76 The patch should be verified and tested - on my box, it has already successfully prevented two runaway child processes today. This looks something like this:
10:42 ERROR: Giving up on polling=/tmp/tw_19010_7yvj.done child_pid=28336 parent_pid=19010
10:42 ERROR: remove_child: child was still running, killing it now! child_pid=28336 parent_pid=19010
10:42 ERROR: Haven't been able to get updated tweets since 10:35
10:47 UPDATE: Update succeeded.
The root cause of the 100% CPU load seems to be a bug somwhere in the LWP modules, see https://rt.cpan.org/Public/Bug/Display.html?id=78920