twirssi Irssi uses 100% CPU time

Hello,

I have this issue where Irssi starts using 100% CPU. The time seems to correlate with the following messages from the Twirssi debug log:

03:21:08 'timeline' => '460572445892571136'03:21:08 } 03:21:08 }; 03:26:03 get_updates starting upd=1 03:26:03 bg_setup starting upd=1 03:26:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 0 03:26:13 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:18 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:23 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:28 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:33 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:38 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:43 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:48 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:53 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:26:58 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:03 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:13 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:18 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:23 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:28 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:33 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:38 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:43 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:48 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:53 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:27:58 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:28:03 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:28:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763 03:28:08 Giving up on polling /tmp/tw_9366_Kbf6.done 03:31:03 get_updates starting upd=1 03:31:03 bg_setup starting upd=1 03:31:08 checking child log at /tmp/tw_9366_c3jr.done mtime=1398645063 v 0 03:31:13 checking child log at /tmp/tw_9366_c3jr.done no /tmp/tw_9366_c3jr.19390 v 1398645063 03:31:13 +063 teromiettinen@Twitter Loading home_timeline page 1 03:31:13 +064 page: $VAR1 = {};

Apr 28 '14 04:04 Crazyguy

How long have you been running twirssi, and how long has this problem been evident?

(heard a few, recent reports from others, on this subject)

Apr 28 '14 20:04 gedge

are you on debian?

Apr 28 '14 20:04 gedge

The problem has occurred for maybe a year (I have been using Twirssi for about 3 years). The issue starts after anywhere from a day to a month of Irssi uptime. I'm not sure if computer uptime makes a difference.

Yes, I've been using Debian stable.

Apr 28 '14 20:04 Crazyguy

Same here:

first occurence of the problem could be a year ago
problem occurs randomly - sometimes after weeks of irssi uptime, sometimes after a day. Lately, more a day than a month that's why I'm starting to look into the issue :-)
Debian stable

I'm currently checking two things (at once :-)

put the waitpid( -1, WNOHANG) into a loop like seen on perldoc -f waitpid – if there are multiple dead childs, only one will be reaped at a time
remove the calls to Irssi::pidwait_remove() because the irssi documentation makes it sound like without this call children will be reaped automatically (quote from http://www.irssi.org/documentation/perl : it is usually not necessary to call this function

Apr 28 '14 21:04 mmitch

Additional info: The child process seems to be stuck in an endless loop talking to twitter:

% strace -p 26734 2>&1 | head -10

Process 26734 attached - interrupt to quit read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable) read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)

% lsof -n -p 26734 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME [...] irssi 26734 mitch 11u REG 0,18 0 676486626 /tmp/tw_8171_lX8K.26734 irssi 26734 mitch 12u IPv4 676487523 0t0 TCP 87.175.121.166:34436->199.16.156.199:https (CLOSE_WAIT) irssi 26734 mitch 13u IPv4 676488332 0t0 TCP 87.175.121.166:34465->199.16.156.199:https (ESTABLISHED) [...]

% dig -x 199.16.156.199 +nocomment

; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> -x 199.16.156.199 +nocomment ;; global options: +cmd ;199.156.16.199.in-addr.arpa. IN PTR 156.16.199.in-addr.arpa. 12 IN SOA ns1.p34.dynect.net. root.twitter.com. 7 3600 600 604800 60 ;; Query time: 0 msec ;; SERVER: 192.168.0.105#53(192.168.0.105) ;; WHEN: Fri Apr 25 19:37:25 2014 ;; MSG SIZE rcvd: 115

Apr 28 '14 21:04 mmitch

Is it bad if I just kill the process as a workaround?

Apr 29 '14 04:04 Crazyguy

You can safely kill the processes, but you might accumulate zombies that way. At least the 100% CPU should be gone. A workaround is incoming :-)

Apr 30 '14 18:04 mmitch

For my workaround see #76 The patch should be verified and tested - on my box, it has already successfully prevented two runaway child processes today. This looks something like this:

10:42 ERROR: Giving up on polling=/tmp/tw_19010_7yvj.done child_pid=28336 parent_pid=19010
10:42 ERROR: remove_child: child was still running, killing it now! child_pid=28336 parent_pid=19010
10:42 ERROR: Haven't been able to get updated tweets since 10:35
10:47 UPDATE: Update succeeded.

Apr 30 '14 19:04 mmitch

The root cause of the 100% CPU load seems to be a bug somwhere in the LWP modules, see https://rt.cpan.org/Public/Bug/Display.html?id=78920

Apr 30 '14 19:04 mmitch

twirssi twirssi copied to clipboard

Irssi uses 100% CPU time

twirssi
twirssi copied to clipboard